TY - GEN
T1 - Data-oriented content query system
T2 - 3rd ACM International Conference on Web Search and Data Mining, WSDM 2010
AU - Zhou, Mianwei
AU - Cheng, Tao
AU - Chang, Kevin Chen Chuan
PY - 2010/4/20
Y1 - 2010/4/20
N2 - As the Web provides rich data embedded in the immense contents inside pages, we witness many ad-hoc efforts for exploiting fine granularity information across Web text, such as Web information extraction, typed-entity search, and question answering. To unify and generalize these efforts, this paper proposes a general search system - Data-oriented Content Query System(DoCQS) - to search directly into document contents for finding relevant values of desired data types. Motivated by the current limitations, we start by distilling the essential capabilities needed by such content querying. The capabilities call for a conceptually relational model, upon which we design a powerful Content Query Language (CQL). For efficient processing, we design novel index structures and query processing algorithms. We evaluate our proposal over two concrete domains of realistic Web corpora, demonstrating that our query language is rather flexible and expressive, and our query processing is efficient with reasonable index overhead.
AB - As the Web provides rich data embedded in the immense contents inside pages, we witness many ad-hoc efforts for exploiting fine granularity information across Web text, such as Web information extraction, typed-entity search, and question answering. To unify and generalize these efforts, this paper proposes a general search system - Data-oriented Content Query System(DoCQS) - to search directly into document contents for finding relevant values of desired data types. Motivated by the current limitations, we start by distilling the essential capabilities needed by such content querying. The capabilities call for a conceptually relational model, upon which we design a powerful Content Query Language (CQL). For efficient processing, we design novel index structures and query processing algorithms. We evaluate our proposal over two concrete domains of realistic Web corpora, demonstrating that our query language is rather flexible and expressive, and our query processing is efficient with reasonable index overhead.
KW - Content query
KW - Content query language
KW - Contextual index
KW - Data oriented
KW - Inverted index
KW - Joint index
UR - http://www.scopus.com/inward/record.url?scp=77950911883&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77950911883&partnerID=8YFLogxK
U2 - 10.1145/1718487.1718503
DO - 10.1145/1718487.1718503
M3 - Conference contribution
AN - SCOPUS:77950911883
SN - 9781605588896
T3 - WSDM 2010 - Proceedings of the 3rd ACM International Conference on Web Search and Data Mining
SP - 121
EP - 130
BT - WSDM 2010 - Proceedings of the 3rd ACM International Conference on Web Search and Data Mining
Y2 - 3 February 2010 through 6 February 2010
ER -