Star-cubing: Computing iceberg cubes by top-down and bottom-up integration

Dong Xin, Jiawei Han, Xiaolei Li, Benjamin W. Wah

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Data cube computation is one of the most essential but expensive operations in data warehousing. Previous studies have developed two major approaches, top-down vs. bottomup. The former, represented by the Multi-Way Array Cube (called MultiWay) algorithm [25], aggregates simultaneously on multiple dimensions; however, it cannot take advantage of Apriori pruning [2] when computing iceberg cubes (cubes that contain only aggregate cells whose measure value satisfies a threshold, called iceberg condition). The latter, represented by two algorithms: BUC [6] and H-Cubing[11], computes the iceberg cube bottom-up and facilitates Apriori pruning. BUC explores fast sorting and partitioning techniques; whereas H-Cubing explores a data structure, H-Tree, for shared computation. However, none of them fully explores multi-dimensional simultaneous aggregation. In this paper, we present a new method, Star-Cubing, that integrates the strengths of the previous three algorithms and performs aggregations on multiple dimensions simultaneously. It utilizes a star-tree structure, extends the simultaneous aggregation methods, and enables the pruning of the group-by's that do not satisfy the iceberg condition. Our performance study shows that Star-Cubing is highly efficient and outperforms all the previous methods in almost all kinds of data distributions.

Original languageEnglish (US)
Title of host publicationProceedings - 29th International Conference on Very Large Data Bases, VLDB 2003
EditorsPatricia G. Selinger, Michael J. Carey, Johann Christoph Freytag, Serge Abiteboul, Peter C. Lockemann, Andreas Heuer
PublisherMorgan Kaufmann
Pages476-487
Number of pages12
ISBN (Electronic)0127224424, 9780127224428
StatePublished - Jan 1 2003
Event29th International Conference on Very Large Data Bases, VLDB 2003 - Berlin, Germany
Duration: Sep 9 2003Sep 12 2003

Publication series

NameProceedings - 29th International Conference on Very Large Data Bases, VLDB 2003

Other

Other29th International Conference on Very Large Data Bases, VLDB 2003
CountryGermany
CityBerlin
Period9/9/039/12/03

Fingerprint

Stars
Agglomeration
Data warehouses
Sorting
Data structures
Bottom-up
Pruning
Top-down
Data warehousing
Data cube
Partitioning
Top-down approach

ASJC Scopus subject areas

  • Software
  • Information Systems
  • Hardware and Architecture
  • Information Systems and Management
  • Computer Science Applications
  • Computer Networks and Communications

Cite this

Xin, D., Han, J., Li, X., & Wah, B. W. (2003). Star-cubing: Computing iceberg cubes by top-down and bottom-up integration. In P. G. Selinger, M. J. Carey, J. C. Freytag, S. Abiteboul, P. C. Lockemann, & A. Heuer (Eds.), Proceedings - 29th International Conference on Very Large Data Bases, VLDB 2003 (pp. 476-487). (Proceedings - 29th International Conference on Very Large Data Bases, VLDB 2003). Morgan Kaufmann.

Star-cubing : Computing iceberg cubes by top-down and bottom-up integration. / Xin, Dong; Han, Jiawei; Li, Xiaolei; Wah, Benjamin W.

Proceedings - 29th International Conference on Very Large Data Bases, VLDB 2003. ed. / Patricia G. Selinger; Michael J. Carey; Johann Christoph Freytag; Serge Abiteboul; Peter C. Lockemann; Andreas Heuer. Morgan Kaufmann, 2003. p. 476-487 (Proceedings - 29th International Conference on Very Large Data Bases, VLDB 2003).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Xin, D, Han, J, Li, X & Wah, BW 2003, Star-cubing: Computing iceberg cubes by top-down and bottom-up integration. in PG Selinger, MJ Carey, JC Freytag, S Abiteboul, PC Lockemann & A Heuer (eds), Proceedings - 29th International Conference on Very Large Data Bases, VLDB 2003. Proceedings - 29th International Conference on Very Large Data Bases, VLDB 2003, Morgan Kaufmann, pp. 476-487, 29th International Conference on Very Large Data Bases, VLDB 2003, Berlin, Germany, 9/9/03.
Xin D, Han J, Li X, Wah BW. Star-cubing: Computing iceberg cubes by top-down and bottom-up integration. In Selinger PG, Carey MJ, Freytag JC, Abiteboul S, Lockemann PC, Heuer A, editors, Proceedings - 29th International Conference on Very Large Data Bases, VLDB 2003. Morgan Kaufmann. 2003. p. 476-487. (Proceedings - 29th International Conference on Very Large Data Bases, VLDB 2003).
Xin, Dong ; Han, Jiawei ; Li, Xiaolei ; Wah, Benjamin W. / Star-cubing : Computing iceberg cubes by top-down and bottom-up integration. Proceedings - 29th International Conference on Very Large Data Bases, VLDB 2003. editor / Patricia G. Selinger ; Michael J. Carey ; Johann Christoph Freytag ; Serge Abiteboul ; Peter C. Lockemann ; Andreas Heuer. Morgan Kaufmann, 2003. pp. 476-487 (Proceedings - 29th International Conference on Very Large Data Bases, VLDB 2003).
@inproceedings{d01ec8e5e8cc43f0b06cfacf30ef25e8,
title = "Star-cubing: Computing iceberg cubes by top-down and bottom-up integration",
abstract = "Data cube computation is one of the most essential but expensive operations in data warehousing. Previous studies have developed two major approaches, top-down vs. bottomup. The former, represented by the Multi-Way Array Cube (called MultiWay) algorithm [25], aggregates simultaneously on multiple dimensions; however, it cannot take advantage of Apriori pruning [2] when computing iceberg cubes (cubes that contain only aggregate cells whose measure value satisfies a threshold, called iceberg condition). The latter, represented by two algorithms: BUC [6] and H-Cubing[11], computes the iceberg cube bottom-up and facilitates Apriori pruning. BUC explores fast sorting and partitioning techniques; whereas H-Cubing explores a data structure, H-Tree, for shared computation. However, none of them fully explores multi-dimensional simultaneous aggregation. In this paper, we present a new method, Star-Cubing, that integrates the strengths of the previous three algorithms and performs aggregations on multiple dimensions simultaneously. It utilizes a star-tree structure, extends the simultaneous aggregation methods, and enables the pruning of the group-by's that do not satisfy the iceberg condition. Our performance study shows that Star-Cubing is highly efficient and outperforms all the previous methods in almost all kinds of data distributions.",
author = "Dong Xin and Jiawei Han and Xiaolei Li and Wah, {Benjamin W.}",
year = "2003",
month = "1",
day = "1",
language = "English (US)",
series = "Proceedings - 29th International Conference on Very Large Data Bases, VLDB 2003",
publisher = "Morgan Kaufmann",
pages = "476--487",
editor = "Selinger, {Patricia G.} and Carey, {Michael J.} and Freytag, {Johann Christoph} and Serge Abiteboul and Lockemann, {Peter C.} and Andreas Heuer",
booktitle = "Proceedings - 29th International Conference on Very Large Data Bases, VLDB 2003",

}

TY - GEN

T1 - Star-cubing

T2 - Computing iceberg cubes by top-down and bottom-up integration

AU - Xin, Dong

AU - Han, Jiawei

AU - Li, Xiaolei

AU - Wah, Benjamin W.

PY - 2003/1/1

Y1 - 2003/1/1

N2 - Data cube computation is one of the most essential but expensive operations in data warehousing. Previous studies have developed two major approaches, top-down vs. bottomup. The former, represented by the Multi-Way Array Cube (called MultiWay) algorithm [25], aggregates simultaneously on multiple dimensions; however, it cannot take advantage of Apriori pruning [2] when computing iceberg cubes (cubes that contain only aggregate cells whose measure value satisfies a threshold, called iceberg condition). The latter, represented by two algorithms: BUC [6] and H-Cubing[11], computes the iceberg cube bottom-up and facilitates Apriori pruning. BUC explores fast sorting and partitioning techniques; whereas H-Cubing explores a data structure, H-Tree, for shared computation. However, none of them fully explores multi-dimensional simultaneous aggregation. In this paper, we present a new method, Star-Cubing, that integrates the strengths of the previous three algorithms and performs aggregations on multiple dimensions simultaneously. It utilizes a star-tree structure, extends the simultaneous aggregation methods, and enables the pruning of the group-by's that do not satisfy the iceberg condition. Our performance study shows that Star-Cubing is highly efficient and outperforms all the previous methods in almost all kinds of data distributions.

AB - Data cube computation is one of the most essential but expensive operations in data warehousing. Previous studies have developed two major approaches, top-down vs. bottomup. The former, represented by the Multi-Way Array Cube (called MultiWay) algorithm [25], aggregates simultaneously on multiple dimensions; however, it cannot take advantage of Apriori pruning [2] when computing iceberg cubes (cubes that contain only aggregate cells whose measure value satisfies a threshold, called iceberg condition). The latter, represented by two algorithms: BUC [6] and H-Cubing[11], computes the iceberg cube bottom-up and facilitates Apriori pruning. BUC explores fast sorting and partitioning techniques; whereas H-Cubing explores a data structure, H-Tree, for shared computation. However, none of them fully explores multi-dimensional simultaneous aggregation. In this paper, we present a new method, Star-Cubing, that integrates the strengths of the previous three algorithms and performs aggregations on multiple dimensions simultaneously. It utilizes a star-tree structure, extends the simultaneous aggregation methods, and enables the pruning of the group-by's that do not satisfy the iceberg condition. Our performance study shows that Star-Cubing is highly efficient and outperforms all the previous methods in almost all kinds of data distributions.

UR - http://www.scopus.com/inward/record.url?scp=85012157812&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85012157812&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85012157812

T3 - Proceedings - 29th International Conference on Very Large Data Bases, VLDB 2003

SP - 476

EP - 487

BT - Proceedings - 29th International Conference on Very Large Data Bases, VLDB 2003

A2 - Selinger, Patricia G.

A2 - Carey, Michael J.

A2 - Freytag, Johann Christoph

A2 - Abiteboul, Serge

A2 - Lockemann, Peter C.

A2 - Heuer, Andreas

PB - Morgan Kaufmann

ER -