TY - GEN
T1 - Star-cubing
T2 - 29th International Conference on Very Large Data Bases, VLDB 2003
AU - Xin, Dong
AU - Han, Jiawei
AU - Li, Xiaolei
AU - Wah, Benjamin W.
PY - 2003
Y1 - 2003
N2 - Data cube computation is one of the most essential but expensive operations in data warehousing. Previous studies have developed two major approaches, top-down vs. bottomup. The former, represented by the Multi-Way Array Cube (called MultiWay) algorithm [25], aggregates simultaneously on multiple dimensions; however, it cannot take advantage of Apriori pruning [2] when computing iceberg cubes (cubes that contain only aggregate cells whose measure value satisfies a threshold, called iceberg condition). The latter, represented by two algorithms: BUC [6] and H-Cubing[11], computes the iceberg cube bottom-up and facilitates Apriori pruning. BUC explores fast sorting and partitioning techniques; whereas H-Cubing explores a data structure, H-Tree, for shared computation. However, none of them fully explores multi-dimensional simultaneous aggregation. In this paper, we present a new method, Star-Cubing, that integrates the strengths of the previous three algorithms and performs aggregations on multiple dimensions simultaneously. It utilizes a star-tree structure, extends the simultaneous aggregation methods, and enables the pruning of the group-by's that do not satisfy the iceberg condition. Our performance study shows that Star-Cubing is highly efficient and outperforms all the previous methods in almost all kinds of data distributions.
AB - Data cube computation is one of the most essential but expensive operations in data warehousing. Previous studies have developed two major approaches, top-down vs. bottomup. The former, represented by the Multi-Way Array Cube (called MultiWay) algorithm [25], aggregates simultaneously on multiple dimensions; however, it cannot take advantage of Apriori pruning [2] when computing iceberg cubes (cubes that contain only aggregate cells whose measure value satisfies a threshold, called iceberg condition). The latter, represented by two algorithms: BUC [6] and H-Cubing[11], computes the iceberg cube bottom-up and facilitates Apriori pruning. BUC explores fast sorting and partitioning techniques; whereas H-Cubing explores a data structure, H-Tree, for shared computation. However, none of them fully explores multi-dimensional simultaneous aggregation. In this paper, we present a new method, Star-Cubing, that integrates the strengths of the previous three algorithms and performs aggregations on multiple dimensions simultaneously. It utilizes a star-tree structure, extends the simultaneous aggregation methods, and enables the pruning of the group-by's that do not satisfy the iceberg condition. Our performance study shows that Star-Cubing is highly efficient and outperforms all the previous methods in almost all kinds of data distributions.
UR - http://www.scopus.com/inward/record.url?scp=85012157812&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85012157812&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85012157812
T3 - Proceedings - 29th International Conference on Very Large Data Bases, VLDB 2003
SP - 476
EP - 487
BT - Proceedings - 29th International Conference on Very Large Data Bases, VLDB 2003
A2 - Selinger, Patricia G.
A2 - Carey, Michael J.
A2 - Freytag, Johann Christoph
A2 - Abiteboul, Serge
A2 - Lockemann, Peter C.
A2 - Heuer, Andreas
PB - Morgan Kaufmann
Y2 - 9 September 2003 through 12 September 2003
ER -