Towards End-to-end SDC Detection for HPC Applications Equipped with Lossy Compression

Sihuan Li, Sheng Di, Kai Zhao, Xin Liang, Zizhong Chen, Franck Cappello

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Data reduction techniques have been widely demanded and used by large-scale high performance computing (HPC) applications because of vast volumes of data to be produced and stored for post-analysis. Due to very limited compression ratios of lossless compressors, error-bounded lossy compression has become an indispensable part in many HPC applications nowadays, because it can significantly reduce science data volume with user-acceptable data distortion. Since the large-scale HPC applications equipped with lossy compression techniques always need to deal with vast volume of data, soft errors or silent data corruptions (SDC) are non-negligible. Although SDC detection techniques have been studied for years, no studies were performed toward the HPC applications with lossy compression, leaving a significant gap between these applications and confidence of execution results. To fill this gap, this paper proposes a couple of SDC detection strategies for scientific simulations with lossy compression. Experimental results on 4 widely used scientific simulation datasets show promising detection ability could be still obtained with two popular lossy compressors. Our parallel experiments with up to 1,024 cores confirm that the time overheads could be limited within 7.9%.

Original languageEnglish (US)
Title of host publicationProceedings - 2020 IEEE International Conference on Cluster Computing, CLUSTER 2020
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages326-336
Number of pages11
ISBN (Electronic)9781728166773
DOIs
StatePublished - Sep 2020
Event22nd IEEE International Conference on Cluster Computing, CLUSTER 2020 - Kobe, Japan
Duration: Sep 14 2020Sep 17 2020

Publication series

NameProceedings - IEEE International Conference on Cluster Computing, ICCC
Volume2020-September
ISSN (Print)1552-5244

Conference

Conference22nd IEEE International Conference on Cluster Computing, CLUSTER 2020
Country/TerritoryJapan
CityKobe
Period9/14/209/17/20

ASJC Scopus subject areas

  • Software
  • Hardware and Architecture
  • Signal Processing

Fingerprint

Dive into the research topics of 'Towards End-to-end SDC Detection for HPC Applications Equipped with Lossy Compression'. Together they form a unique fingerprint.

Cite this