Detecting and correcting data corruption in stencil applications through multivariate interpolation

Leonardo Arturo Bautista Gomez, Franck Cappello

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

High-performance computing is a powerful tool that allows scientists to study complex natural phenomena. Extreme-scale supercomputers promise orders of magnitude higher performance compared with that of current systems. However, power constrains in future exascale systems might limit the level of resilience of those machines. In particular, data could get corrupted silently, that is, without the hardware detecting the corruption. This situation is clearly unacceptable: simulation results must be within the error margin specified by the user. In this paper, we exploit multivariate interpolation in order to detect and correct data corruption in stencil applications. We evaluate this technique with a turbulent fluid application, and we demonstrate that the prediction error using multivariate interpolation is on the order of 0.01. Our results show that this mechanism can detect and correct most important corruptions and keep the error deviation under 1% during the entire execution while injecting one corruption per minute. In addition, we stress test the detector by injecting more than ten corruptions per minute and observe that our strategy allows the application to produce results with an error deviation under 10% in such a stressful scenario.

Original languageEnglish (US)
Title of host publicationProceedings - 2015 IEEE International Conference on Cluster Computing, CLUSTER 2015
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages595-602
Number of pages8
ISBN (Electronic)9781467365987
DOIs
StatePublished - Oct 26 2015
Externally publishedYes
EventIEEE International Conference on Cluster Computing, CLUSTER 2015 - Chicago, United States
Duration: Sep 8 2015Sep 11 2015

Publication series

NameProceedings - IEEE International Conference on Cluster Computing, ICCC
Volume2015-October
ISSN (Print)1552-5244

Other

OtherIEEE International Conference on Cluster Computing, CLUSTER 2015
Country/TerritoryUnited States
CityChicago
Period9/8/159/11/15

Keywords

  • Computational fluid dynamics
  • Detectors
  • Hardware
  • Interpolation
  • Prediction algorithms
  • Switches
  • Three-dimensional displays

ASJC Scopus subject areas

  • Software
  • Hardware and Architecture
  • Signal Processing

Fingerprint

Dive into the research topics of 'Detecting and correcting data corruption in stencil applications through multivariate interpolation'. Together they form a unique fingerprint.

Cite this