TY - GEN
T1 - Optimizing index for taxonomy keyword search
AU - Ding, Bolin
AU - Wang, Haixun
AU - Jin, Ruoming
AU - Han, Jiawei
AU - Wang, Zhongyuan
PY - 2012
Y1 - 2012
N2 - Query substitution is an important problem in information retrieval. Much work focuses on how to find substitutes for any given query. In this paper, we study how to efficiently process a keyword query whose substitutes are defined by a given taxonomy. This problem is challenging because each term in a query can have a large number of substitutes, and the original query can be rewritten into any of their combinations. We propose to build an additional index (besides inverted index) to efficiently process queries. For a query workload, we formulate an optimization problem which chooses the additional index structure, aiming at minimizing the query evaluation cost, under given index space constraints. We show the NP-hardness of the problem, and propose a pseudo-polynomial time algorithm using dynamic programming, as well as an 1 over 4(1-1/e)-approximation algorithm to solve the problem. Experimental results show that, with only 10% additional index space, our approach can greatly reduce the query evaluation cost.
AB - Query substitution is an important problem in information retrieval. Much work focuses on how to find substitutes for any given query. In this paper, we study how to efficiently process a keyword query whose substitutes are defined by a given taxonomy. This problem is challenging because each term in a query can have a large number of substitutes, and the original query can be rewritten into any of their combinations. We propose to build an additional index (besides inverted index) to efficiently process queries. For a query workload, we formulate an optimization problem which chooses the additional index structure, aiming at minimizing the query evaluation cost, under given index space constraints. We show the NP-hardness of the problem, and propose a pseudo-polynomial time algorithm using dynamic programming, as well as an 1 over 4(1-1/e)-approximation algorithm to solve the problem. Experimental results show that, with only 10% additional index space, our approach can greatly reduce the query evaluation cost.
KW - index
KW - keyword search
KW - materialization
KW - taxonomy
UR - http://www.scopus.com/inward/record.url?scp=84862667763&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84862667763&partnerID=8YFLogxK
U2 - 10.1145/2213836.2213892
DO - 10.1145/2213836.2213892
M3 - Conference contribution
AN - SCOPUS:84862667763
SN - 9781450312479
T3 - Proceedings of the ACM SIGMOD International Conference on Management of Data
SP - 493
EP - 504
BT - SIGMOD '12 - Proceedings of the International Conference on Management of Data
T2 - 2012 ACM SIGMOD International Conference on Management of Data, SIGMOD '12
Y2 - 21 May 2012 through 24 May 2012
ER -