Optimization of error-bounded lossy compression for hard-to-compress HPC data

Sheng Di, Franck Cappello

Research output: Contribution to journalArticlepeer-review

Abstract

Since today's scientific applications are producing vast amounts of data, compressing them before storage/transmission is critical. Results of existing compressors show two types of HPC data sets: highly compressible and hard to compress. In this work, we carefully design and optimize the error-bounded lossy compression for hard-to-compress scientific data.We propose an optimized algorithm that can adaptively partition the HPC data into best-fit consecutive segments each having mutually close data values, such that the compression condition can be optimized. Another significant contribution is the optimization of shifting offset such that the XOR-leading-zero length between two consecutive unpredictable data points can be maximized.We finally devise an adaptive method to select the best-fit compressor at runtime for maximizing the compression factor. We evaluate our solution using 13 benchmarks based on real-world scientific problems, and we compare it with 9 other state-of-the-art compressors. Experiments show that our compressor can always guarantee the compression errors within the user-specified error bounds. Most importantly, our optimization can improve the compression factor effectively, by up to 49 percent for hard-to-compress data sets with similar compression/ decompression time cost.

Original languageEnglish (US)
Pages (from-to)129-143
Number of pages15
JournalIEEE Transactions on Parallel and Distributed Systems
Volume29
Issue number1
DOIs
StatePublished - Jan 2018
Externally publishedYes

Keywords

  • Error-bounded lossy compression
  • Floating-point data compression
  • High performance computing
  • Scientific simulation

ASJC Scopus subject areas

  • Signal Processing
  • Hardware and Architecture
  • Computational Theory and Mathematics

Fingerprint

Dive into the research topics of 'Optimization of error-bounded lossy compression for hard-to-compress HPC data'. Together they form a unique fingerprint.

Cite this