Abstract
In this tutorial we view the World Wide Web as a type of massive, decentralized database. At present, this "Web database" is presented in a manner largely devoid of any consistent meaning or schema. That is not to say that Web-data lacks an underlying organization; in fact, most Web content is generated from an underlying schema-bound, or otherwise structured database. Information extraction is generally concerned with the reconciliation of unstructured or semi-structured Web content with the neatly structured database paradigm. With this Web-database in hand, researchers and practitioners have recently begun developing mechanisms which return structured results in response to an unstructured query. These new developments are a product of (1) record, list and table extraction from large numbers of semi-structured Web pages, (2) integration of these disparate extraction results into a consistent form, and (3) analysis of the newly extracted and integrated Web data. Among the many fruits of this line of work is the ability for semi-structured Web data to enhance the search capabilities of a schema-bound database. Alternatively, structured database records have also been used to augment Web page collections typically used by Web search engines. We will cover several key technologies, and principles explored so far in the area of Web information extraction, search and exploration.
Original language | English (US) |
---|---|
Title of host publication | WSDM 2013 - Proceedings of the 6th ACM International Conference on Web Search and Data Mining |
Pages | 779-780 |
Number of pages | 2 |
DOIs | |
State | Published - 2013 |
Event | 6th ACM International Conference on Web Search and Data Mining, WSDM 2013 - Rome, Italy Duration: Feb 4 2013 → Feb 8 2013 |
Other
Other | 6th ACM International Conference on Web Search and Data Mining, WSDM 2013 |
---|---|
Country/Territory | Italy |
City | Rome |
Period | 2/4/13 → 2/8/13 |
Keywords
- information extraction
- information integration
- semi-structured data
ASJC Scopus subject areas
- Computer Networks and Communications
- Computer Science Applications