Beyond pages: Supporting efficient, scalable entity search with dual-inversion index

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Entity search, a significant departure from page-based retrieval, finds data, i.e., entities, embedded in documents directly and holistically across the whole collection. This paper aims at distilling and abstracting the essential computation requirements of entity search. From the dual views of reasoning - entity as input and entity as output, we propose a dual-inversion framework, with two indexing and partition schemes, towards efficient and scalable query processing. We systematically evaluate our framework using a prototype over a 3TB real Web corpus with 150M pages and over 20 entity types extracted. Our experiments in two concrete application settings show our techniques of on average, 2 to 4 orders of magnitude speed-up, over the keyword-based baseline, with reasonable space overhead.

Original languageEnglish (US)
Title of host publicationAdvances in Database Technology - EDBT 2010 - 13th International Conference on Extending Database Technology, Proceedings
Pages15-26
Number of pages12
DOIs
StatePublished - May 19 2010
Event13th International Conference on Extending Database Technology: Advances in Database Technology - EDBT 2010 - Lausanne, Switzerland
Duration: Mar 22 2010Mar 26 2010

Publication series

NameAdvances in Database Technology - EDBT 2010 - 13th International Conference on Extending Database Technology, Proceedings

Other

Other13th International Conference on Extending Database Technology: Advances in Database Technology - EDBT 2010
CountrySwitzerland
CityLausanne
Period3/22/103/26/10

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Software

Fingerprint Dive into the research topics of 'Beyond pages: Supporting efficient, scalable entity search with dual-inversion index'. Together they form a unique fingerprint.

Cite this