Growing parallel paths for entity-page discovery

Tim Weninger, Fabio Fumarola, Cindy Xide Lin, Rick Barber, Jiawei Han, Donato Malerba

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper, we use the structural and relational information on the Web to find entity-pages. Specifically, given a Web site and an entity-page (e.g., department and faculty member homepage) we seek to find all of the entity-pages of the same type (e.g., all faculty members in the department). To do this, we propose a web structure mining method which grows parallel paths through the web graph and DOM trees. We show that by utilizing these parallel paths we can efficiently discover all entity-pages of the same type. Finally, we demonstrate the accuracy of our method with a case study on various domains.

Original languageEnglish (US)
Title of host publicationProceedings of the 20th International Conference Companion on World Wide Web, WWW 2011
Pages145-146
Number of pages2
DOIs
StatePublished - 2011
Event20th International Conference Companion on World Wide Web, WWW 2011 - Hyderabad, India
Duration: Mar 28 2011Apr 1 2011

Publication series

NameProceedings of the 20th International Conference Companion on World Wide Web, WWW 2011

Other

Other20th International Conference Companion on World Wide Web, WWW 2011
Country/TerritoryIndia
CityHyderabad
Period3/28/114/1/11

Keywords

  • entity pages
  • parallel paths
  • semi-structured data
  • web structure mining

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems

Fingerprint

Dive into the research topics of 'Growing parallel paths for entity-page discovery'. Together they form a unique fingerprint.

Cite this