TY - GEN
T1 - Towards Combining Error-bounded Lossy Compression and Cryptography for Scientific Data
AU - Shan, Ruiwen
AU - Di, Sheng
AU - Calhoun, Jon C.
AU - Cappello, Franck
N1 - Funding Information:
The material was supported by U.S. Department of Energy, Office of Science, under contract DE-AC02-06CH11357, also by National Science Foundation under Grant No. SHF-1910197, SHF-1943114, and OAC-2003709.
Publisher Copyright:
© 2021 IEEE.
PY - 2021
Y1 - 2021
N2 - In the scientific domain, extremely large amounts of data are generated by large-scale high performance computing (HPC) simulations. Storing and sending such vast volumes of data poses serious scalability and performance issues, which can be considerably mitigated by data compression techniques which significantly reduced storage size and data movement burdens. Since scientific data are being shared by scientists more and more frequently, data security methods that ensure the confidentiality, integrity, and availability of data itself are becoming increasingly important. As such, combing compression and encryption is critical to storing large-scale datasets securely. In this work, we explore how to integrate data compression and cryptography techniques as efficiently as possible for big scientific datasets in the HPC field. We perform thorough experiments using different scientific datasets with the state-of-The-Art error-bounded lossy compressor-SZ-on a real-world supercomputing environment. Experiments verify that performing encryption before lossy compression (a.k.a., encr-cmpr method) may invalidate the advantage of compression algorithms. By contrast, executing encryption after lossy compression (a.k.a., cmpr-encr method) keeps not only high compression ratios but high overall execution speed. Experiments also verify that the encryption overhead under the cmpr-encr method decreases with increasing compression ratios, which means very good scalability.
AB - In the scientific domain, extremely large amounts of data are generated by large-scale high performance computing (HPC) simulations. Storing and sending such vast volumes of data poses serious scalability and performance issues, which can be considerably mitigated by data compression techniques which significantly reduced storage size and data movement burdens. Since scientific data are being shared by scientists more and more frequently, data security methods that ensure the confidentiality, integrity, and availability of data itself are becoming increasingly important. As such, combing compression and encryption is critical to storing large-scale datasets securely. In this work, we explore how to integrate data compression and cryptography techniques as efficiently as possible for big scientific datasets in the HPC field. We perform thorough experiments using different scientific datasets with the state-of-The-Art error-bounded lossy compressor-SZ-on a real-world supercomputing environment. Experiments verify that performing encryption before lossy compression (a.k.a., encr-cmpr method) may invalidate the advantage of compression algorithms. By contrast, executing encryption after lossy compression (a.k.a., cmpr-encr method) keeps not only high compression ratios but high overall execution speed. Experiments also verify that the encryption overhead under the cmpr-encr method decreases with increasing compression ratios, which means very good scalability.
KW - cryptography
KW - data compression
KW - data security
UR - http://www.scopus.com/inward/record.url?scp=85123499458&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85123499458&partnerID=8YFLogxK
U2 - 10.1109/HPEC49654.2021.9622874
DO - 10.1109/HPEC49654.2021.9622874
M3 - Conference contribution
AN - SCOPUS:85123499458
T3 - 2021 IEEE High Performance Extreme Computing Conference, HPEC 2021
BT - 2021 IEEE High Performance Extreme Computing Conference, HPEC 2021
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2021 IEEE High Performance Extreme Computing Conference, HPEC 2021
Y2 - 20 September 2021 through 24 September 2021
ER -