TY - GEN
T1 - Beyond pages
T2 - 13th International Conference on Extending Database Technology: Advances in Database Technology - EDBT 2010
AU - Cheng, Tao
AU - Chang, Kevin Chen-Chuan
PY - 2010
Y1 - 2010
N2 - Entity search, a significant departure from page-based retrieval, finds data, i.e., entities, embedded in documents directly and holistically across the whole collection. This paper aims at distilling and abstracting the essential computation requirements of entity search. From the dual views of reasoning - entity as input and entity as output, we propose a dual-inversion framework, with two indexing and partition schemes, towards efficient and scalable query processing. We systematically evaluate our framework using a prototype over a 3TB real Web corpus with 150M pages and over 20 entity types extracted. Our experiments in two concrete application settings show our techniques of on average, 2 to 4 orders of magnitude speed-up, over the keyword-based baseline, with reasonable space overhead.
AB - Entity search, a significant departure from page-based retrieval, finds data, i.e., entities, embedded in documents directly and holistically across the whole collection. This paper aims at distilling and abstracting the essential computation requirements of entity search. From the dual views of reasoning - entity as input and entity as output, we propose a dual-inversion framework, with two indexing and partition schemes, towards efficient and scalable query processing. We systematically evaluate our framework using a prototype over a 3TB real Web corpus with 150M pages and over 20 entity types extracted. Our experiments in two concrete application settings show our techniques of on average, 2 to 4 orders of magnitude speed-up, over the keyword-based baseline, with reasonable space overhead.
UR - http://www.scopus.com/inward/record.url?scp=77952277802&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77952277802&partnerID=8YFLogxK
U2 - 10.1145/1739041.1739047
DO - 10.1145/1739041.1739047
M3 - Conference contribution
AN - SCOPUS:77952277802
SN - 9781605589459
T3 - Advances in Database Technology - EDBT 2010 - 13th International Conference on Extending Database Technology, Proceedings
SP - 15
EP - 26
BT - Advances in Database Technology - EDBT 2010 - 13th International Conference on Extending Database Technology, Proceedings
Y2 - 22 March 2010 through 26 March 2010
ER -