Improving Prediction-Based Lossy Compression Dramatically via Ratio-Quality Modeling

Sian Jin, Sheng Di, Jiannan Tian, Suren Byna, Dingwen Tao, Franck Cappello

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Error-bounded lossy compression is one of the most effective techniques for reducing scientific data sizes. However, the traditional trial-and-error approach used to configure lossy compressors for finding the optimal trade-off between reconstructed data quality and compression ratio is prohibitively expensive. To resolve this issue, we develop a general-purpose analytical ratio-quality model based on the prediction-based lossy compression framework, which can effectively foresee the reduced data quality and compression ratio, as well as the impact of lossy compressed data on post-hoc analysis quality. Our analytical model significantly improves the prediction-based lossy compression in three use-cases: (1) optimization of predictor by selecting the best-fit predictor; (2) memory compression with a target ratio; and (3) in-situ compression optimization by fine-grained tuning error-bounds for various data partitions. We evaluate our analytical model on 10 scientific datasets, demonstrating its high accuracy (93.47% accuracy on average) and low computational cost (up to 18.7x lower than the trial-and-error approach) for estimating the compression ratio and the impact of lossy compression on post-hoc analysis quality. We also verify the high efficiency of our ratio-quality model using different applications across the three use-cases. In addition, our experiment demonstrates that our modeling-based approach reduces the time to store the 3D RTM data with HDF5 by up to 3.4 x with 128 CPU cores over the traditional solution.

Original languageEnglish (US)
Title of host publicationProceedings - 2022 IEEE 38th International Conference on Data Engineering, ICDE 2022
PublisherIEEE Computer Society
Pages2494-2507
Number of pages14
ISBN (Electronic)9781665408837
DOIs
StatePublished - 2022
Externally publishedYes
Event38th IEEE International Conference on Data Engineering, ICDE 2022 - Virtual, Online, Malaysia
Duration: May 9 2022May 12 2022

Publication series

NameProceedings - International Conference on Data Engineering
Volume2022-May
ISSN (Print)1084-4627

Conference

Conference38th IEEE International Conference on Data Engineering, ICDE 2022
Country/TerritoryMalaysia
CityVirtual, Online
Period5/9/225/12/22

Keywords

  • Analytical modeling
  • Lossy compression
  • Scientific data

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Information Systems

Fingerprint

Dive into the research topics of 'Improving Prediction-Based Lossy Compression Dramatically via Ratio-Quality Modeling'. Together they form a unique fingerprint.

Cite this