TY - GEN
T1 - MDZ
T2 - 38th IEEE International Conference on Data Engineering, ICDE 2022
AU - Zhao, Kai
AU - Di, Sheng
AU - Perez, Danny
AU - Liang, Xin
AU - Chen, Zizhong
AU - Cappello, Franck
N1 - Funding Information:
IX. ACKNOWLEDGMENTS This research was supported by the Exascale Computing Project (ECP), Project Number: 17-SC-20-SC, a collaborative effort of two DOE organizations – the Office of Science and the National Nuclear Security Administration, responsible for the planning and preparation of a capable exascale ecosystem, including software, applications, hardware, advanced system engineering and early testbed platforms, to support the nation’s exascale computing imperative. The material was supported by the U.S. Department of Energy, Office of Science, and by DOE’s Advanced Scientific Research Computing Office (ASCR) under contract DE-AC02-06CH11357, and supported by the National Science Foundation under Grant No. 1617488, No. 2003709, and No. 2104023/2104024. We acknowledge the computing resources provided on Bebop, which is operated by the Laboratory Computing Resource Center at Argonne National Laboratory.
Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Molecular dynamics (MD) has been widely used in today's scientific research across multiple domains including materials science, biochemistry, biophysics, and structural biology. MD simulations can produce extremely large amounts of data in that each simulation could involve a large number of atoms (up to trillions) for a large number of timesteps (up to hundreds of millions). In this paper, we perform an in-depth analysis of a number of MD simulation datasets and then develop an efficient error-bounded lossy compressor that can significantly improve the compression ratios. The contributions are fourfold. (1) We characterize a number of MD datasets and summarize two commonly-used execution models. (2) We develop an adaptive error-bounded lossy compression framework (called MDZ), which can optimize the compression for both execution models adaptively by taking advantage of their specific characteristics. (3) We compare our solution with six other state-of-the-art related works by using three MD simulation packages each with multiple configurations. Experiments show that our solution has up to 233 % higher compression ratios than the second-best lossy compressor in most cases. (4) We demonstrate that MDZ is fully capable of handing particle data beyond MD simulations.
AB - Molecular dynamics (MD) has been widely used in today's scientific research across multiple domains including materials science, biochemistry, biophysics, and structural biology. MD simulations can produce extremely large amounts of data in that each simulation could involve a large number of atoms (up to trillions) for a large number of timesteps (up to hundreds of millions). In this paper, we perform an in-depth analysis of a number of MD simulation datasets and then develop an efficient error-bounded lossy compressor that can significantly improve the compression ratios. The contributions are fourfold. (1) We characterize a number of MD datasets and summarize two commonly-used execution models. (2) We develop an adaptive error-bounded lossy compression framework (called MDZ), which can optimize the compression for both execution models adaptively by taking advantage of their specific characteristics. (3) We compare our solution with six other state-of-the-art related works by using three MD simulation packages each with multiple configurations. Experiments show that our solution has up to 233 % higher compression ratios than the second-best lossy compressor in most cases. (4) We demonstrate that MDZ is fully capable of handing particle data beyond MD simulations.
KW - lossy compression
KW - molecular dynamics
KW - trajectory compression
UR - http://www.scopus.com/inward/record.url?scp=85136412647&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85136412647&partnerID=8YFLogxK
U2 - 10.1109/ICDE53745.2022.00007
DO - 10.1109/ICDE53745.2022.00007
M3 - Conference contribution
AN - SCOPUS:85136412647
T3 - Proceedings - International Conference on Data Engineering
SP - 27
EP - 40
BT - Proceedings - 2022 IEEE 38th International Conference on Data Engineering, ICDE 2022
PB - IEEE Computer Society
Y2 - 9 May 2022 through 12 May 2022
ER -