Neural Network Based Silent Error Detector

Chen Wang, Nikoli Dryden, Franck Cappello, Marc Snir

Research output: Chapter in Book/Report/Conference proceedingConference contribution


As we move toward exascale platforms, silent data corruptions (SDC) are likely to occur more frequently. Such errors can lead to incorrect results. Attempts have been made to use generic algorithms to detect such errors. Such detectors have demonstrated high precision and recall for detecting errors, but only if they run immediately after an error has been injected. In this paper, we propose a neural network detector that can detect SDCs even multiple iterations after they were injected. We have evaluated our detector with 6 FLASH applications and 2 Mantevo mini-apps. Experiments show that our detector can detect more than 89% of SDCs with a false positive rate of less than 2%.

Original languageEnglish (US)
Title of host publicationProceedings - 2018 IEEE International Conference on Cluster Computing, CLUSTER 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
Number of pages11
ISBN (Electronic)9781538683194
StatePublished - Oct 29 2018
Event2018 IEEE International Conference on Cluster Computing, CLUSTER 2018 - Belfast, United Kingdom
Duration: Sep 10 2018Sep 13 2018

Publication series

NameProceedings - IEEE International Conference on Cluster Computing, ICCC
ISSN (Print)1552-5244


Other2018 IEEE International Conference on Cluster Computing, CLUSTER 2018
Country/TerritoryUnited Kingdom


  • Exascale computing
  • Fault tolerance
  • Silent data corruption

ASJC Scopus subject areas

  • Software
  • Hardware and Architecture
  • Signal Processing


Dive into the research topics of 'Neural Network Based Silent Error Detector'. Together they form a unique fingerprint.

Cite this