A Lightweight, Effective Compressibility Estimation Method for Error-bounded Lossy Compression

Arkaprabha Ganguli, Robert Underwood, Julie Bessac, David Krasowska, Jon C. Calhoun, Sheng Di, Franck Cappello

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Error-bounded lossy compression turns more and more important for the data-moving intensive applications to deal with big datasets efficiently in HPC environments, which often requires knowing the compressibility of the datasets before performing the compression. However, the off-the-shelf state-of-the-art lossy compressors are often driven by error bounds, so the compression ratios cannot be forecasted until the completion of the compression operation. In this paper, we propose a lightweight, robust, easy-to-train model that estimates the compressibility of datasets for different lossy compressors accurately. Our approach combines novel predictors that measure various notions of spatial correlation and smoothness exploited by lossy compressors that are implemented efficiently on the GPU in a framework and that uses mixture model regression to improve robustness with conformal prediction to provide bounds on the estimates. We then use these models with a detailed analysis of speedup to understand the tradeoffs between high speed, consistent speed, and accuracy of the methods on real applications. We evaluate our approach in the context of 3 key applications where compression ratio estimation is highly required.

Original languageEnglish (US)
Title of host publicationProceedings - 2023 IEEE International Conference on Cluster Computing, CLUSTER 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages247-258
Number of pages12
ISBN (Electronic)9798350307924
DOIs
StatePublished - 2023
Externally publishedYes
Event25th IEEE International Conference on Cluster Computing, CLUSTER 2023 - Santa Fe, United States
Duration: Oct 31 2023Nov 3 2023

Publication series

NameProceedings - IEEE International Conference on Cluster Computing, ICCC
ISSN (Print)1552-5244

Conference

Conference25th IEEE International Conference on Cluster Computing, CLUSTER 2023
Country/TerritoryUnited States
CitySanta Fe
Period10/31/2311/3/23

Keywords

  • Compression Estimation
  • Error Bounded Lossy Compressors
  • Lossy Compression
  • Rate Distortion

ASJC Scopus subject areas

  • Software
  • Hardware and Architecture
  • Signal Processing

Fingerprint

Dive into the research topics of 'A Lightweight, Effective Compressibility Estimation Method for Error-bounded Lossy Compression'. Together they form a unique fingerprint.

Cite this