TY - GEN
T1 - Entity relation discovery from web tables and links
AU - Lin, Cindy Xide
AU - Zhao, Bo
AU - Weninger, Tim
AU - Han, Jiawei
AU - Liu, Bing
PY - 2010
Y1 - 2010
N2 - The World-Wide Web consists not only of a huge number of unstructured texts, but also a vast amount of valuable structured data. Web tables [2] are a typical type of structured information that are pervasive on the web, and Web-scale methods that automatically extract web tables have been studied extensively [1]. Many powerful systems (e.g.OCTOPUS [4], Mesa [3]) use extracted web tables as a fundamental component. In the database vernacular, a table is defined as a set of tuples which have the same attributes. Similarly, a web table is defined as a set of rows (corresponding to database tuples) which have the same column headers (corresponding to database attributes). Therefore, to extract a web table is to extract a relation on the web. In databases, tables often contain foreign keys which refer to other tables. Therefore, it follows that hyperlinks inside a web table sometimes function as foreign keys to other relations whose tuples are contained in the hyperlink's target pages. In this paper, we explore this idea by asking: can we discover new attributes for web tables by exploring hyperlinks inside web tables? This poster proposes a solution that takes a web table as input. Frequent patterns are generated as new candidate relations by following hyperlinks in the web table. The confidence of candidates are evaluated, and trustworthy candidates are selected to become new attributes for the table. Finally, we show the usefulness of our method by performing experiments on a variety of web domains.
AB - The World-Wide Web consists not only of a huge number of unstructured texts, but also a vast amount of valuable structured data. Web tables [2] are a typical type of structured information that are pervasive on the web, and Web-scale methods that automatically extract web tables have been studied extensively [1]. Many powerful systems (e.g.OCTOPUS [4], Mesa [3]) use extracted web tables as a fundamental component. In the database vernacular, a table is defined as a set of tuples which have the same attributes. Similarly, a web table is defined as a set of rows (corresponding to database tuples) which have the same column headers (corresponding to database attributes). Therefore, to extract a web table is to extract a relation on the web. In databases, tables often contain foreign keys which refer to other tables. Therefore, it follows that hyperlinks inside a web table sometimes function as foreign keys to other relations whose tuples are contained in the hyperlink's target pages. In this paper, we explore this idea by asking: can we discover new attributes for web tables by exploring hyperlinks inside web tables? This poster proposes a solution that takes a web table as input. Frequent patterns are generated as new candidate relations by following hyperlinks in the web table. The confidence of candidates are evaluated, and trustworthy candidates are selected to become new attributes for the table. Finally, we show the usefulness of our method by performing experiments on a variety of web domains.
KW - entity relation discovery
KW - link
KW - web table
UR - http://www.scopus.com/inward/record.url?scp=77954588215&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77954588215&partnerID=8YFLogxK
U2 - 10.1145/1772690.1772846
DO - 10.1145/1772690.1772846
M3 - Conference contribution
AN - SCOPUS:77954588215
SN - 9781605587998
T3 - Proceedings of the 19th International Conference on World Wide Web, WWW '10
SP - 1145
EP - 1146
BT - Proceedings of the 19th International Conference on World Wide Web, WWW '10
T2 - 19th International World Wide Web Conference, WWW2010
Y2 - 26 April 2010 through 30 April 2010
ER -