Data-oriented content query system: Searching for data into text on the web

Mianwei Zhou, Tao Cheng, Kevin Chen Chuan Chang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

As the Web provides rich data embedded in the immense contents inside pages, we witness many ad-hoc efforts for exploiting fine granularity information across Web text, such as Web information extraction, typed-entity search, and question answering. To unify and generalize these efforts, this paper proposes a general search system - Data-oriented Content Query System(DoCQS) - to search directly into document contents for finding relevant values of desired data types. Motivated by the current limitations, we start by distilling the essential capabilities needed by such content querying. The capabilities call for a conceptually relational model, upon which we design a powerful Content Query Language (CQL). For efficient processing, we design novel index structures and query processing algorithms. We evaluate our proposal over two concrete domains of realistic Web corpora, demonstrating that our query language is rather flexible and expressive, and our query processing is efficient with reasonable index overhead.

Original languageEnglish (US)
Title of host publicationWSDM 2010 - Proceedings of the 3rd ACM International Conference on Web Search and Data Mining
Pages121-130
Number of pages10
DOIs
StatePublished - Apr 20 2010
Event3rd ACM International Conference on Web Search and Data Mining, WSDM 2010 - New York City, NY, United States
Duration: Feb 3 2010Feb 6 2010

Publication series

NameWSDM 2010 - Proceedings of the 3rd ACM International Conference on Web Search and Data Mining

Other

Other3rd ACM International Conference on Web Search and Data Mining, WSDM 2010
CountryUnited States
CityNew York City, NY
Period2/3/102/6/10

Keywords

  • Content query
  • Content query language
  • Contextual index
  • Data oriented
  • Inverted index
  • Joint index

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Software

Fingerprint Dive into the research topics of 'Data-oriented content query system: Searching for data into text on the web'. Together they form a unique fingerprint.

Cite this