TY - GEN
T1 - Growing parallel paths for entity-page discovery
AU - Weninger, Tim
AU - Fumarola, Fabio
AU - Lin, Cindy Xide
AU - Barber, Rick
AU - Han, Jiawei
AU - Malerba, Donato
PY - 2011
Y1 - 2011
N2 - In this paper, we use the structural and relational information on the Web to find entity-pages. Specifically, given a Web site and an entity-page (e.g., department and faculty member homepage) we seek to find all of the entity-pages of the same type (e.g., all faculty members in the department). To do this, we propose a web structure mining method which grows parallel paths through the web graph and DOM trees. We show that by utilizing these parallel paths we can efficiently discover all entity-pages of the same type. Finally, we demonstrate the accuracy of our method with a case study on various domains.
AB - In this paper, we use the structural and relational information on the Web to find entity-pages. Specifically, given a Web site and an entity-page (e.g., department and faculty member homepage) we seek to find all of the entity-pages of the same type (e.g., all faculty members in the department). To do this, we propose a web structure mining method which grows parallel paths through the web graph and DOM trees. We show that by utilizing these parallel paths we can efficiently discover all entity-pages of the same type. Finally, we demonstrate the accuracy of our method with a case study on various domains.
KW - entity pages
KW - parallel paths
KW - semi-structured data
KW - web structure mining
UR - http://www.scopus.com/inward/record.url?scp=79955136331&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79955136331&partnerID=8YFLogxK
U2 - 10.1145/1963192.1963266
DO - 10.1145/1963192.1963266
M3 - Conference contribution
AN - SCOPUS:79955136331
SN - 9781450305181
T3 - Proceedings of the 20th International Conference Companion on World Wide Web, WWW 2011
SP - 145
EP - 146
BT - Proceedings of the 20th International Conference Companion on World Wide Web, WWW 2011
T2 - 20th International Conference Companion on World Wide Web, WWW 2011
Y2 - 28 March 2011 through 1 April 2011
ER -