POSTER: Hardening Selective Protection across Multiple Program Inputs for HPC Applications

Yafan Huang, Shengjian Guo, Sheng Di, Guanpeng Li, Franck Cappello

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

With the ever-shrinking size of transistors and increasing scale of applications, silent data corruptions (SDCs) have become a common yet serious issue in HPC applications. Selective instruction duplication (SID) is a popular fault-tolerance technique that can obtain a high SDC coverage with low-performance overhead, as it selects the most vulnerable parts of a program for protection with priority. However, existing studies of SID are confined to single program input in the evaluation, assuming that the error resilience of the program remains similar across inputs, leading to a drastic loss of SDC coverage from SID when the protected program runs different inputs. Hence, we proposed Sentinel, an automated compiler-based framework to mitigate the loss of SDC coverage. Evaluation results show that Sentinel can effectively mitigate the loss of SDC coverage (up to 97.00%) across multiple inputs, which significantly hardens existing SID techniques.

Original languageEnglish (US)
Title of host publicationPPoPP 2022 - Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
PublisherAssociation for Computing Machinery
Pages437-438
Number of pages2
ISBN (Electronic)9781450392044
DOIs
StatePublished - Apr 2 2022
Externally publishedYes
Event27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2022 - Virtual, Online, Korea, Republic of
Duration: Apr 2 2022Apr 6 2022

Publication series

NameProceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP

Conference

Conference27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2022
Country/TerritoryKorea, Republic of
CityVirtual, Online
Period4/2/224/6/22

Keywords

  • compiler
  • error resilience
  • fault injection
  • high performance computing

ASJC Scopus subject areas

  • Software

Fingerprint

Dive into the research topics of 'POSTER: Hardening Selective Protection across Multiple Program Inputs for HPC Applications'. Together they form a unique fingerprint.

Cite this