TY - GEN
T1 - Towards End-to-end SDC Detection for HPC Applications Equipped with Lossy Compression
AU - Li, Sihuan
AU - Di, Sheng
AU - Zhao, Kai
AU - Liang, Xin
AU - Chen, Zizhong
AU - Cappello, Franck
N1 - Funding Information:
ACKNOWLEDGMENTS This research was supported by the Exascale Computing Project (ECP), Project Number: 17-SC-20-SC, a collaborative effort of two DOE organizations – the Office of Science and the National Nuclear Security Administration, responsible for the planning and preparation of a capable exascale ecosystem, including software, applications, hardware, advanced system engineering and early testbed platforms, to support the nations exascale computing imperative. The material was supported by the U.S. Department of Energy, Office of Science, under contract DE-AC02-06CH11357, and supported by the National Science Foundation under Grant No. 1619253. This research is also supported by NSF Award No. 1513201. We acknowledge the computing resources provided on Bebop, which is operated by the Laboratory Computing Resource Center at Argonne National Laboratory.
Funding Information:
This research was supported by the Exascale Computing Project (ECP), Project Number: 17-SC-20-SC, a collaborative effort of two DOE organizations-the Office of Science and the National Nuclear Security Administration, responsible for the planning and preparation of a capable exascale ecosystem, including software, applications, hardware, advanced system engineering and early testbed platforms, to support the nations exascale computing imperative. The material was supported by the U.S. Department of Energy, Office of Science, under contract DE-AC02-06CH11357, and supported by the National Science Foundation under Grant No. 1619253. This research is also supported by NSF Award No. 1513201. We acknowledge the computing resources provided on Bebop, which is operated by the Laboratory Computing Resource Center at Argonne National Laboratory.
Publisher Copyright:
© 2020 IEEE.
PY - 2020/9
Y1 - 2020/9
N2 - Data reduction techniques have been widely demanded and used by large-scale high performance computing (HPC) applications because of vast volumes of data to be produced and stored for post-analysis. Due to very limited compression ratios of lossless compressors, error-bounded lossy compression has become an indispensable part in many HPC applications nowadays, because it can significantly reduce science data volume with user-acceptable data distortion. Since the large-scale HPC applications equipped with lossy compression techniques always need to deal with vast volume of data, soft errors or silent data corruptions (SDC) are non-negligible. Although SDC detection techniques have been studied for years, no studies were performed toward the HPC applications with lossy compression, leaving a significant gap between these applications and confidence of execution results. To fill this gap, this paper proposes a couple of SDC detection strategies for scientific simulations with lossy compression. Experimental results on 4 widely used scientific simulation datasets show promising detection ability could be still obtained with two popular lossy compressors. Our parallel experiments with up to 1,024 cores confirm that the time overheads could be limited within 7.9%.
AB - Data reduction techniques have been widely demanded and used by large-scale high performance computing (HPC) applications because of vast volumes of data to be produced and stored for post-analysis. Due to very limited compression ratios of lossless compressors, error-bounded lossy compression has become an indispensable part in many HPC applications nowadays, because it can significantly reduce science data volume with user-acceptable data distortion. Since the large-scale HPC applications equipped with lossy compression techniques always need to deal with vast volume of data, soft errors or silent data corruptions (SDC) are non-negligible. Although SDC detection techniques have been studied for years, no studies were performed toward the HPC applications with lossy compression, leaving a significant gap between these applications and confidence of execution results. To fill this gap, this paper proposes a couple of SDC detection strategies for scientific simulations with lossy compression. Experimental results on 4 widely used scientific simulation datasets show promising detection ability could be still obtained with two popular lossy compressors. Our parallel experiments with up to 1,024 cores confirm that the time overheads could be limited within 7.9%.
UR - http://www.scopus.com/inward/record.url?scp=85096205381&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85096205381&partnerID=8YFLogxK
U2 - 10.1109/CLUSTER49012.2020.00043
DO - 10.1109/CLUSTER49012.2020.00043
M3 - Conference contribution
AN - SCOPUS:85096205381
T3 - Proceedings - IEEE International Conference on Cluster Computing, ICCC
SP - 326
EP - 336
BT - Proceedings - 2020 IEEE International Conference on Cluster Computing, CLUSTER 2020
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 22nd IEEE International Conference on Cluster Computing, CLUSTER 2020
Y2 - 14 September 2020 through 17 September 2020
ER -