TY - JOUR
T1 - Optimization of error-bounded lossy compression for hard-to-compress HPC data
AU - Di, Sheng
AU - Cappello, Franck
N1 - Funding Information:
This research was supported by the Exascale Computing Project (ECP), Project Number: 17-SC-20-SC, a collaborative effort of two DOE organizations - the Office of Science and the National Nuclear Security Administration, responsible for the planning and preparation of a capable exascale ecosystem, including software, applications, hardware, advanced system engineering and early testbed platforms, to support the nations exascale computing imperative. The submitted manuscript has been created by UChicago Argonne, LLC, Operator of Argonne National Laboratory (Argonne). Argonne, a U.S. Department of Energy Office of Science laboratory, is operated under Contract No. DE-AC02-06CH11357.
Publisher Copyright:
© 2017 IEEE. Personal use is permitted.
PY - 2018/1
Y1 - 2018/1
N2 - Since today's scientific applications are producing vast amounts of data, compressing them before storage/transmission is critical. Results of existing compressors show two types of HPC data sets: highly compressible and hard to compress. In this work, we carefully design and optimize the error-bounded lossy compression for hard-to-compress scientific data.We propose an optimized algorithm that can adaptively partition the HPC data into best-fit consecutive segments each having mutually close data values, such that the compression condition can be optimized. Another significant contribution is the optimization of shifting offset such that the XOR-leading-zero length between two consecutive unpredictable data points can be maximized.We finally devise an adaptive method to select the best-fit compressor at runtime for maximizing the compression factor. We evaluate our solution using 13 benchmarks based on real-world scientific problems, and we compare it with 9 other state-of-the-art compressors. Experiments show that our compressor can always guarantee the compression errors within the user-specified error bounds. Most importantly, our optimization can improve the compression factor effectively, by up to 49 percent for hard-to-compress data sets with similar compression/ decompression time cost.
AB - Since today's scientific applications are producing vast amounts of data, compressing them before storage/transmission is critical. Results of existing compressors show two types of HPC data sets: highly compressible and hard to compress. In this work, we carefully design and optimize the error-bounded lossy compression for hard-to-compress scientific data.We propose an optimized algorithm that can adaptively partition the HPC data into best-fit consecutive segments each having mutually close data values, such that the compression condition can be optimized. Another significant contribution is the optimization of shifting offset such that the XOR-leading-zero length between two consecutive unpredictable data points can be maximized.We finally devise an adaptive method to select the best-fit compressor at runtime for maximizing the compression factor. We evaluate our solution using 13 benchmarks based on real-world scientific problems, and we compare it with 9 other state-of-the-art compressors. Experiments show that our compressor can always guarantee the compression errors within the user-specified error bounds. Most importantly, our optimization can improve the compression factor effectively, by up to 49 percent for hard-to-compress data sets with similar compression/ decompression time cost.
KW - Error-bounded lossy compression
KW - Floating-point data compression
KW - High performance computing
KW - Scientific simulation
UR - http://www.scopus.com/inward/record.url?scp=85049405900&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85049405900&partnerID=8YFLogxK
U2 - 10.1109/TPDS.2017.2749300
DO - 10.1109/TPDS.2017.2749300
M3 - Article
AN - SCOPUS:85049405900
SN - 1045-9219
VL - 29
SP - 129
EP - 143
JO - IEEE Transactions on Parallel and Distributed Systems
JF - IEEE Transactions on Parallel and Distributed Systems
IS - 1
ER -