TY - GEN
T1 - Significantly Improving Lossy Compression for HPC Datasets with Second-Order Prediction and Parameter Optimization
AU - Zhao, Kai
AU - Di, Sheng
AU - Liang, Xin
AU - Li, Sihuan
AU - Tao, Dingwen
AU - Chen, Zizhong
AU - Cappello, Franck
N1 - Funding Information:
This research was supported by the Exascale Computing Project (ECP), Project Number: 17-SC-20-SC, a collaborative effort of two DOE organizations - the Office of Science and the National Nuclear Security Administration, responsible for the planning and preparation of a capable exascale ecosystem, including software, applications, hardware, advanced system engineering and early testbed platforms, to support the nation’s exascale computing imperative. The material was supported by the U.S. Department of Energy, Office of Science, under contract DE-AC02-06CH11357, and supported by the National Science Foundation under Grant No. 1619253. This work was also supported by National Science Foundation CCF 1513201. We acknowledge the computing resources provided on Bebop, which is operated by the Laboratory Computing Resource Center at Argonne National Laboratory.
Publisher Copyright:
© 2020 Owner/Author.
PY - 2020/6/23
Y1 - 2020/6/23
N2 - Today's extreme-scale high-performance computing (HPC) applications are producing volumes of data too large to save or transfer because of limited storage space and I/O bandwidth. Error-bounded lossy compression has been commonly known as one of the best solutions to the big science data issue, because it can significantly reduce the data volume with strictly controlled data distortion based on user requirements. In this work, we develop an adaptive parameter optimization algorithm integrated with a series of optimization strategies for SZ, a state-of-the-art prediction-based compression model. Our contribution is threefold. (1) We exploit effective strategies by using 2nd-order regression and 2nd-order Lorenzo predictors to improve the prediction accuracy significantly for SZ, thus substantially improving the overall compression quality. (2) We design an efficient approach selecting the best-fit parameter setting, by conducting a comprehensive priori compression quality analysis and exploiting an efficient online controlling mechanism. (3) We evaluate the compression quality and performance on a supercomputer with 4,096 cores, as compared with other state-of-the-art error-bounded lossy compressors. Experiments with multiple real-world HPC simulations datasets show that our solution can improve the compression ratio up to 46% compared with the second-best compressor. Moreover, the parallel I/O performance is improved by up to 40% thanks to the significant reduction of data size.
AB - Today's extreme-scale high-performance computing (HPC) applications are producing volumes of data too large to save or transfer because of limited storage space and I/O bandwidth. Error-bounded lossy compression has been commonly known as one of the best solutions to the big science data issue, because it can significantly reduce the data volume with strictly controlled data distortion based on user requirements. In this work, we develop an adaptive parameter optimization algorithm integrated with a series of optimization strategies for SZ, a state-of-the-art prediction-based compression model. Our contribution is threefold. (1) We exploit effective strategies by using 2nd-order regression and 2nd-order Lorenzo predictors to improve the prediction accuracy significantly for SZ, thus substantially improving the overall compression quality. (2) We design an efficient approach selecting the best-fit parameter setting, by conducting a comprehensive priori compression quality analysis and exploiting an efficient online controlling mechanism. (3) We evaluate the compression quality and performance on a supercomputer with 4,096 cores, as compared with other state-of-the-art error-bounded lossy compressors. Experiments with multiple real-world HPC simulations datasets show that our solution can improve the compression ratio up to 46% compared with the second-best compressor. Moreover, the parallel I/O performance is improved by up to 40% thanks to the significant reduction of data size.
KW - high-performance computing
KW - lossy compression
KW - parameter optimization
KW - rate distortion
KW - science data
UR - http://www.scopus.com/inward/record.url?scp=85088369707&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85088369707&partnerID=8YFLogxK
U2 - 10.1145/3369583.3392688
DO - 10.1145/3369583.3392688
M3 - Conference contribution
AN - SCOPUS:85088369707
T3 - HPDC 2020 - Proceedings of the 29th International Symposium on High-Performance Parallel and Distributed Computing
SP - 89
EP - 100
BT - HPDC 2020 - Proceedings of the 29th International Symposium on High-Performance Parallel and Distributed Computing
PB - Association for Computing Machinery
T2 - 29th International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2020
Y2 - 23 June 2020 through 26 June 2020
ER -