Mapping web pages to database records via link paths

Tim Weninger, Fabio Fumarola, Jiawei Han, Donato Malerba

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper we propose a new knowledge management task which aims to map Web pages to their corresponding records in a structured database. For example, the DBLP database contains records for many computer scientists, and most of these persons have public Web pages; if we can map the database record with the appropriate Web page then the new information could be used to further describe the person's database record. To accomplish this goal we employ link paths which contain anchor texts from multiple paths through the Web ending at the Web page in question. We hypothesize that the information from these link paths can be used to generate an accurate Web page to database record mapping. Experiments on two large, real world data sets, DBLP and IMDB for the structured data and computer science faculty members' Web pages and official movie homepages for the Web page data, show that our method does provide an accurate mapping. Finally, we conclude by issuing a call for further research on this promising new task.

Original languageEnglish (US)
Title of host publicationCIKM'10 - Proceedings of the 19th International Conference on Information and Knowledge Management and Co-located Workshops
Pages1637-1640
Number of pages4
DOIs
StatePublished - 2010
Event19th International Conference on Information and Knowledge Management and Co-located Workshops, CIKM'10 - Toronto, ON, Canada
Duration: Oct 26 2010Oct 30 2010

Publication series

NameInternational Conference on Information and Knowledge Management, Proceedings

Other

Other19th International Conference on Information and Knowledge Management and Co-located Workshops, CIKM'10
Country/TerritoryCanada
CityToronto, ON
Period10/26/1010/30/10

Keywords

  • Link paths
  • Mapping
  • Semi-structured data
  • Web

ASJC Scopus subject areas

  • General Decision Sciences
  • General Business, Management and Accounting

Fingerprint

Dive into the research topics of 'Mapping web pages to database records via link paths'. Together they form a unique fingerprint.

Cite this