Optimizing index for taxonomy keyword search

Bolin Ding, Haixun Wang, Ruoming Jin, Jiawei Han, Zhongyuan Wang

Research output: Chapter in Book/Report/Conference proceedingConference contribution


Query substitution is an important problem in information retrieval. Much work focuses on how to find substitutes for any given query. In this paper, we study how to efficiently process a keyword query whose substitutes are defined by a given taxonomy. This problem is challenging because each term in a query can have a large number of substitutes, and the original query can be rewritten into any of their combinations. We propose to build an additional index (besides inverted index) to efficiently process queries. For a query workload, we formulate an optimization problem which chooses the additional index structure, aiming at minimizing the query evaluation cost, under given index space constraints. We show the NP-hardness of the problem, and propose a pseudo-polynomial time algorithm using dynamic programming, as well as an 1 over 4(1-1/e)-approximation algorithm to solve the problem. Experimental results show that, with only 10% additional index space, our approach can greatly reduce the query evaluation cost.

Original languageEnglish (US)
Title of host publicationSIGMOD '12 - Proceedings of the International Conference on Management of Data
Number of pages12
StatePublished - 2012
Event2012 ACM SIGMOD International Conference on Management of Data, SIGMOD '12 - Scottsdale, AZ, United States
Duration: May 21 2012May 24 2012

Publication series

NameProceedings of the ACM SIGMOD International Conference on Management of Data
ISSN (Print)0730-8078


Other2012 ACM SIGMOD International Conference on Management of Data, SIGMOD '12
Country/TerritoryUnited States
CityScottsdale, AZ


  • index
  • keyword search
  • materialization
  • taxonomy

ASJC Scopus subject areas

  • Software
  • Information Systems


Dive into the research topics of 'Optimizing index for taxonomy keyword search'. Together they form a unique fingerprint.

Cite this