TY - JOUR
T1 - Mining constrained gradients in large databases
AU - Dong, Guozhu
AU - Han, Jiawei
AU - Lam, Joyce M.W.
AU - Pei, Jian
AU - Wang, Ke
AU - Zou, Wei
N1 - Funding Information:
This work was supported in part by the Natural Sciences and Engineering Research Council of Canada (NSERC-A3723) and the Networks of Centres of Excellence of Canada (NCE/IRIS-3). A preliminary version appeared in the Proceedings of the 2001 VLDB Conference under the title “Mining Multi-Dimensional Constrained Gradients in Data Cubes.”
PY - 2004/8
Y1 - 2004/8
N2 - Many data analysis tasks can be viewed as search or mining in a multidimensional space (MDS). In such MDSs, dimensions capture potentially important factors for given applications, and cells represent combinations of values for the factors. To systematically analyze data in MDS, an interesting notion, called "cubegrade" was recently introduced by lmielinski et al. which focuses on the notable changes in measures in MDS by comparing a cell (which we refer to as probe cell) with its gradient cells, namely, its ancestors, descendants, and siblings. We call such queries gradient analysis queries (GQs). Since an MDS can contain billions of cells, it is important to answer GQs efficiently. In this study, we focus on developing efficient methods for mining GQs constrained by certain (weakly) antimonotone constraints. Instead of conducting an independent gradient-cell search once per probe cell, which is inefficient due to much repeated work, we propose an efficient algorithm, LiveSet-Driven. This algorithm finds all good gradient-probe cell pairs in one search pass. It utilizes measure-value analysis and dimension-match analysis in a set-oriented manner, to achieve bidirectional pruning between the sets of hopeful probe cells and of hopeful gradient cells. Moreover, it adopts a hypertree structure and an H-cubing method to compress data and to maximize sharing of computation. Our performance study shows that this algorithm is efficient and scalable. In addition to data cubes, we extend our study to another important scenario: mining constrained gradients in transactional databases where each item is associated with some measures such as price. Such transactional databases can be viewed as sparse MDSs where items represent dimensions, although they have significantly different characteristics than data cubes. We outline efficient mining methods for this problem in this paper.
AB - Many data analysis tasks can be viewed as search or mining in a multidimensional space (MDS). In such MDSs, dimensions capture potentially important factors for given applications, and cells represent combinations of values for the factors. To systematically analyze data in MDS, an interesting notion, called "cubegrade" was recently introduced by lmielinski et al. which focuses on the notable changes in measures in MDS by comparing a cell (which we refer to as probe cell) with its gradient cells, namely, its ancestors, descendants, and siblings. We call such queries gradient analysis queries (GQs). Since an MDS can contain billions of cells, it is important to answer GQs efficiently. In this study, we focus on developing efficient methods for mining GQs constrained by certain (weakly) antimonotone constraints. Instead of conducting an independent gradient-cell search once per probe cell, which is inefficient due to much repeated work, we propose an efficient algorithm, LiveSet-Driven. This algorithm finds all good gradient-probe cell pairs in one search pass. It utilizes measure-value analysis and dimension-match analysis in a set-oriented manner, to achieve bidirectional pruning between the sets of hopeful probe cells and of hopeful gradient cells. Moreover, it adopts a hypertree structure and an H-cubing method to compress data and to maximize sharing of computation. Our performance study shows that this algorithm is efficient and scalable. In addition to data cubes, we extend our study to another important scenario: mining constrained gradients in transactional databases where each item is associated with some measures such as price. Such transactional databases can be viewed as sparse MDSs where items represent dimensions, although they have significantly different characteristics than data cubes. We outline efficient mining methods for this problem in this paper.
KW - Antimonotonicity
KW - Complex measures
KW - Constraint-based pruning
KW - Data cube
KW - Data mining
KW - Dimension-based pruning
KW - Gradient analysis
KW - Iceberg query
UR - http://www.scopus.com/inward/record.url?scp=4344569749&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=4344569749&partnerID=8YFLogxK
U2 - 10.1109/TKDE.2004.28
DO - 10.1109/TKDE.2004.28
M3 - Article
AN - SCOPUS:4344569749
VL - 16
SP - 922
EP - 938
JO - IEEE Transactions on Knowledge and Data Engineering
JF - IEEE Transactions on Knowledge and Data Engineering
SN - 1041-4347
IS - 8
ER -