Optimal chunking of large multidimensional arrays for data warehousing

E. J. Otoo, Doron Rotem, Sridhar Seshadri

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Very large multidimensional arrays are commonly used in data intensive scientific computations as well ason-line analytical processing applications referred to as MOLAP. The storage organization of such arrays on disks is done by partitioning the large global array into fixed size sub-arrays called chunks or tiles that form the units of data transfer between disk and memory. Typical queries involve the retrieval of sub-arrays in a manner that access all chunks that overlap the query results. An important metric of the storage efficiency is the expected number of chunks retrieved over all such queries. The question that immediately arises is "what shapes of array chunks give theminimum expected number of chunks over a query workload?" The problem of optimal chunking was first introduced by Sarawagi and Stonebraker [11] who gave an approximate solution. In this paper we develop exact mathematical models of the problem and provide exact solutions using steepest descent and geometric programming methods. Experimental results, using synthetic and real life workloads, show that our solutions are consistently within than 2.0% of the true number of chunks retrieved for any number of dimensions. In contrast, the approximate solution of [11] can deviate considerably from the true result with increasing number of dimensions and also may lead suboptimal chunk shapes.

Original languageEnglish (US)
Title of host publicationCIKM 2007 Co-Located Workshops - Proceedings of DOLAP'07
Pages25-32
Number of pages8
DOIs
StatePublished - 2007
Externally publishedYes
Event10th ACM International Workshop on Data Warehousing and OLAP, DOLAP'07 - Co-Located with CIKM 2007 - Lisboa, Portugal
Duration: Nov 6 2007Nov 9 2007

Publication series

NameDOLAP: Proceedings of the ACM International Workshop on Data Warehousing and OLAP

Other

Other10th ACM International Workshop on Data Warehousing and OLAP, DOLAP'07 - Co-Located with CIKM 2007
Country/TerritoryPortugal
CityLisboa
Period11/6/0711/9/07

Keywords

  • chunking
  • data warehousing
  • multi-dimensional arrays

ASJC Scopus subject areas

  • Computer Science(all)

Fingerprint

Dive into the research topics of 'Optimal chunking of large multidimensional arrays for data warehousing'. Together they form a unique fingerprint.

Cite this