TY - GEN
T1 - Mapping web pages to database records via link paths
AU - Weninger, Tim
AU - Fumarola, Fabio
AU - Han, Jiawei
AU - Malerba, Donato
PY - 2010
Y1 - 2010
N2 - In this paper we propose a new knowledge management task which aims to map Web pages to their corresponding records in a structured database. For example, the DBLP database contains records for many computer scientists, and most of these persons have public Web pages; if we can map the database record with the appropriate Web page then the new information could be used to further describe the person's database record. To accomplish this goal we employ link paths which contain anchor texts from multiple paths through the Web ending at the Web page in question. We hypothesize that the information from these link paths can be used to generate an accurate Web page to database record mapping. Experiments on two large, real world data sets, DBLP and IMDB for the structured data and computer science faculty members' Web pages and official movie homepages for the Web page data, show that our method does provide an accurate mapping. Finally, we conclude by issuing a call for further research on this promising new task.
AB - In this paper we propose a new knowledge management task which aims to map Web pages to their corresponding records in a structured database. For example, the DBLP database contains records for many computer scientists, and most of these persons have public Web pages; if we can map the database record with the appropriate Web page then the new information could be used to further describe the person's database record. To accomplish this goal we employ link paths which contain anchor texts from multiple paths through the Web ending at the Web page in question. We hypothesize that the information from these link paths can be used to generate an accurate Web page to database record mapping. Experiments on two large, real world data sets, DBLP and IMDB for the structured data and computer science faculty members' Web pages and official movie homepages for the Web page data, show that our method does provide an accurate mapping. Finally, we conclude by issuing a call for further research on this promising new task.
KW - Link paths
KW - Mapping
KW - Semi-structured data
KW - Web
UR - http://www.scopus.com/inward/record.url?scp=78651307282&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=78651307282&partnerID=8YFLogxK
U2 - 10.1145/1871437.1871692
DO - 10.1145/1871437.1871692
M3 - Conference contribution
AN - SCOPUS:78651307282
SN - 9781450300995
T3 - International Conference on Information and Knowledge Management, Proceedings
SP - 1637
EP - 1640
BT - CIKM'10 - Proceedings of the 19th International Conference on Information and Knowledge Management and Co-located Workshops
T2 - 19th International Conference on Information and Knowledge Management and Co-located Workshops, CIKM'10
Y2 - 26 October 2010 through 30 October 2010
ER -