TY - GEN
T1 - Optimizing error-bounded lossy compression for scientific data by dynamic spline interpolation
AU - Zhao, Kai
AU - Di, Sheng
AU - Dmitriev, Maxim
AU - Tonellot, Thierry Laurent D.
AU - Chen, Zizhong
AU - Cappello, Franck
N1 - Funding Information:
VIII. ACKNOWLEDGMENTS This research was supported by the Exascale Computing Project (ECP), Project Number: 17-SC-20-SC, a collaborative effort of two DOE organizations – the Office of Science and the National Nuclear Security Administration, responsible for the planning and preparation of a capable exascale ecosystem, including software, applications, hardware, advanced system engineering and early testbed platforms, to support the nation’s exascale computing imperative. The material was supported by the U.S. Department of Energy, Office of Science, under contract DE-AC02-06CH11357,and supported by the National Science Foundation under Grant No. 1619253 and No. 2003709. We acknowledge the computing resources provided on Bebop, which is operated by the Laboratory Computing Resource Center at Argonne National Laboratory.
Publisher Copyright:
© 2021 IEEE.
PY - 2021/4
Y1 - 2021/4
N2 - Today's scientific simulations are producing vast volumes of data that cannot be stored and transferred efficiently because of limited storage capacity, parallel I/O bandwidth, and network bandwidth. The situation is getting worse over time because of the ever-increasing gap between relatively slow data transfer speed and fast-growing computation power in modern supercomputers. Error-bounded lossy compression is becoming one of the most critical techniques for resolving the big scientific data issue, in that it can significantly reduce the scientific data volume while guaranteeing that the reconstructed data is valid for users because of its compression-error-bounding feature. In this paper, we present a novel error-bounded lossy compressor based on a state-of-the-art prediction-based compression framework. Our solution exhibits substantially better compression quality than all of the existing error-bounded lossy compressors, with comparable compression speed. Specifically, our contribution is threefold. (1) We provide an in-depth analysis of why the best-existing prediction-based lossy compressor can only minimally improve the compression quality. (2) We propose a dynamic spline interpolation approach with a series of optimization strategies that can significantly improve the data prediction accuracy, substantially improving the compression quality in turn. (3) We perform a thorough evaluation using six real-world scientific simulation datasets across different science domains to evaluate our solution vs. all other related works. Experiments show that the compression ratio of our solution is higher than that of the second-best lossy compressor by 20% 460% with the same error bound in most of the cases. ∼
AB - Today's scientific simulations are producing vast volumes of data that cannot be stored and transferred efficiently because of limited storage capacity, parallel I/O bandwidth, and network bandwidth. The situation is getting worse over time because of the ever-increasing gap between relatively slow data transfer speed and fast-growing computation power in modern supercomputers. Error-bounded lossy compression is becoming one of the most critical techniques for resolving the big scientific data issue, in that it can significantly reduce the scientific data volume while guaranteeing that the reconstructed data is valid for users because of its compression-error-bounding feature. In this paper, we present a novel error-bounded lossy compressor based on a state-of-the-art prediction-based compression framework. Our solution exhibits substantially better compression quality than all of the existing error-bounded lossy compressors, with comparable compression speed. Specifically, our contribution is threefold. (1) We provide an in-depth analysis of why the best-existing prediction-based lossy compressor can only minimally improve the compression quality. (2) We propose a dynamic spline interpolation approach with a series of optimization strategies that can significantly improve the data prediction accuracy, substantially improving the compression quality in turn. (3) We perform a thorough evaluation using six real-world scientific simulation datasets across different science domains to evaluate our solution vs. all other related works. Experiments show that the compression ratio of our solution is higher than that of the second-best lossy compressor by 20% 460% with the same error bound in most of the cases. ∼
KW - Data compression
KW - Data reduction
KW - Lossy compressor
KW - Scientific data management
UR - http://www.scopus.com/inward/record.url?scp=85108361340&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85108361340&partnerID=8YFLogxK
U2 - 10.1109/ICDE51399.2021.00145
DO - 10.1109/ICDE51399.2021.00145
M3 - Conference contribution
AN - SCOPUS:85108361340
T3 - Proceedings - International Conference on Data Engineering
SP - 1643
EP - 1654
BT - Proceedings - 2021 IEEE 37th International Conference on Data Engineering, ICDE 2021
PB - IEEE Computer Society
T2 - 37th IEEE International Conference on Data Engineering, ICDE 2021
Y2 - 19 April 2021 through 22 April 2021
ER -