TY - JOUR
T1 - Computing iceberg cubes by top-down and bottom-up integration
T2 - The starcubing approach
AU - Xin, Dong
AU - Han, Jiawei
AU - Li, Xiaolei
AU - Shao, Zheng
AU - Wah, Benjamin W.
N1 - Funding Information:
This work was supported in part by US National Science Foundation grants NSF IIS-03-08215/05-13678 and NSF BDI-05-15813. Any opinions, findings, conclusions, or recommendations expressed in this paper are those of the authors and do not necessarily reflect the views of the funding agencies.
PY - 2007/1
Y1 - 2007/1
N2 - Data cube computation is one of the most essential but expensive operations in data warehousing. Previous studies have developed two major approaches, top-down versus bottom-up. The former, represented by the MultiWay Array Cube (called the Multiway) algorithm [30], aggregates simultaneously on multiple dimensions; however, it cannot take advantage of a priori pruning [2] when computing iceberg cubes (cubes that contain only aggregate cells whose measure values satisfy a threshold, called the iceberg condition). The latter, represented by BUC [6], computes the iceberg cube bottom-up and facilitates a priori pruning. BUC explores fast sorting and partitioning techniques; however, it does not fully explore multidimensional simultaneous aggregation. In this paper, we present a new method, Star-Cubing, that integrates the strengths of the previous two algorithms and performs aggregations on multiple dimensions simultaneously. It utilizes a star-tree structure, extends the simultaneous aggregation methods, and enables the pruning of the group-bys that do not satisfy the iceberg condition. Our performance study shows that Star-Cubing is highly efficient and outperforms the previous methods.
AB - Data cube computation is one of the most essential but expensive operations in data warehousing. Previous studies have developed two major approaches, top-down versus bottom-up. The former, represented by the MultiWay Array Cube (called the Multiway) algorithm [30], aggregates simultaneously on multiple dimensions; however, it cannot take advantage of a priori pruning [2] when computing iceberg cubes (cubes that contain only aggregate cells whose measure values satisfy a threshold, called the iceberg condition). The latter, represented by BUC [6], computes the iceberg cube bottom-up and facilitates a priori pruning. BUC explores fast sorting and partitioning techniques; however, it does not fully explore multidimensional simultaneous aggregation. In this paper, we present a new method, Star-Cubing, that integrates the strengths of the previous two algorithms and performs aggregations on multiple dimensions simultaneously. It utilizes a star-tree structure, extends the simultaneous aggregation methods, and enables the pruning of the group-bys that do not satisfy the iceberg condition. Our performance study shows that Star-Cubing is highly efficient and outperforms the previous methods.
KW - Data mining
KW - Data warehouse
KW - Online analytical processing (OLAP)
UR - http://www.scopus.com/inward/record.url?scp=33845635228&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33845635228&partnerID=8YFLogxK
U2 - 10.1109/TKDE.2007.250589
DO - 10.1109/TKDE.2007.250589
M3 - Article
AN - SCOPUS:33845635228
SN - 1041-4347
VL - 19
SP - 111
EP - 126
JO - IEEE Transactions on Knowledge and Data Engineering
JF - IEEE Transactions on Knowledge and Data Engineering
IS - 1
ER -