TY - GEN
T1 - Progressive and selective merge
T2 - SIGMOD 2007: ACM SIGMOD International Conference on Management of Data
AU - Xin, Dong
AU - Han, Jiawei
AU - Chang, Kevin Chen-Chuan
PY - 2007
Y1 - 2007
N2 - The family of threshold algorithm (ie, TA) has been widely studied for efficiently computing top-k queries. TA uses a sort-merge framework that assumes data lists are pre-sorted, and the ranking functions are monotone. However, in many database applications, attribute values are indexed by tree-structured indices (eg, B-tree, R-tree), and the ranking functions are not necessarily monotone. To answer top-k queries with ad-hoc ranking functions, this paper studies anindex-merge paradigm that performs progressive search over the space of joint states composed by multiple index nodes. We address two challenges for efficient query processing. First, to minimize the search complexity, we present a double-heap algorithm which supports not only progressive state search but also progressive state generation. Second, to avoid unnecessary disk access, we characterize a type of "empty-state" that does not contribute to the final results, and propose a new materialization model, join-signature, to prune empty-states. Our performance study shows that the proposed method achieves one order of magnitude speed-up over baseline solutions.
AB - The family of threshold algorithm (ie, TA) has been widely studied for efficiently computing top-k queries. TA uses a sort-merge framework that assumes data lists are pre-sorted, and the ranking functions are monotone. However, in many database applications, attribute values are indexed by tree-structured indices (eg, B-tree, R-tree), and the ranking functions are not necessarily monotone. To answer top-k queries with ad-hoc ranking functions, this paper studies anindex-merge paradigm that performs progressive search over the space of joint states composed by multiple index nodes. We address two challenges for efficient query processing. First, to minimize the search complexity, we present a double-heap algorithm which supports not only progressive state search but also progressive state generation. Second, to avoid unnecessary disk access, we characterize a type of "empty-state" that does not contribute to the final results, and propose a new materialization model, join-signature, to prune empty-states. Our performance study shows that the proposed method achieves one order of magnitude speed-up over baseline solutions.
KW - Progressive merge
KW - Selective merge
KW - Top-k query
UR - http://www.scopus.com/inward/record.url?scp=35448935270&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=35448935270&partnerID=8YFLogxK
U2 - 10.1145/1247480.1247494
DO - 10.1145/1247480.1247494
M3 - Conference contribution
AN - SCOPUS:35448935270
SN - 1595936866
SN - 9781595936868
T3 - Proceedings of the ACM SIGMOD International Conference on Management of Data
SP - 103
EP - 114
BT - SIGMOD 2007
Y2 - 12 June 2007 through 14 June 2007
ER -