Exploring Lossy Compressibility through Statistical Correlations of Scientific Datasets

David Krasowska, Julie Bessac, Robert Underwood, Jon C. Calhoun, Sheng Di, Franck Cappello

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Lossy compression plays a growing role in scientific simulations where the cost of storing their output data can span terabytes. Using error bounded lossy compression reduces the amount of storage for each simulation; however, there is no known bound for the upper limit on lossy compressibility. Correlation structures in the data, choice of compressor and error bound are factors allowing larger compression ratios and improved quality metrics. Analyzing these three factors provides one direction towards quantifying lossy compressibility. As a first step, we explore statistical methods to characterize the correlation structures present in the data and their relationships, through functional regression models, to compression ratios. We observed a relationship between compression ratios and several statistics summarizing the correlation structure of the data, which is a first step towards evaluating the theoretical limits of lossy compressibility used to eventually predict compression performance and adapt compressors to correlation structures present in the data.

Original languageEnglish (US)
Title of host publicationProceedings of DRBSD-7 2021
Subtitle of host publication7th International Workshop on Data Analysis and Reduction for Big Scientific Data, Held in conjunction with SC 2021: The International Conference for High Performance Computing, Networking, Storage and Analysis
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages47-53
Number of pages7
ISBN (Electronic)9781728186726
DOIs
StatePublished - 2021
Externally publishedYes
Event7th International Workshop on Data Analysis and Reduction for Big Scientific Data, DRBSD-7 2021 - St. Louis, United States
Duration: Nov 14 2021 → …

Publication series

NameProceedings of DRBSD-7 2021: 7th International Workshop on Data Analysis and Reduction for Big Scientific Data, Held in conjunction with SC 2021: The International Conference for High Performance Computing, Networking, Storage and Analysis

Conference

Conference7th International Workshop on Data Analysis and Reduction for Big Scientific Data, DRBSD-7 2021
Country/TerritoryUnited States
CitySt. Louis
Period11/14/21 → …

Keywords

  • Compression
  • High performance computing
  • Lossy compression
  • Statistical correlation analysis

ASJC Scopus subject areas

  • Artificial Intelligence
  • Information Systems
  • Information Systems and Management
  • Statistics, Probability and Uncertainty
  • Media Technology

Fingerprint

Dive into the research topics of 'Exploring Lossy Compressibility through Statistical Correlations of Scientific Datasets'. Together they form a unique fingerprint.

Cite this