Abstract

This paper presents a study on long latency failures using accelerated fault injection. The data collected from the experiments are used to analyze the significance, causes, and characteristics of long latency failures caused by soft errors in the processor and the memory. The results indicate that a non-negligible portion of soft errors in the code and data memory lead to long latency failures. The long latency failures are caused by errors with long fault activation times and errors causing failures only under certain runtime conditions. On the other hand, less than 0.5% of soft errors in the processor registers used in kernel mode lead to a failure with latency longer than a thousand seconds. This is due to a strong temporal locality of the register values. The study shows also that the obtained insight can be used to guide design and placement (in the application code and/or system) of application-specific error detectors.

Original languageEnglish (US)
Title of host publication2009 15th IEEE Pacific Rim International Symposium on Dependable Computing, PRDC 2009
Pages23-30
Number of pages8
DOIs
StatePublished - 2009
Event2009 15th IEEE Pacific Rim International Symposium on Dependable Computing, PRDC 2009 - Shanghai, China
Duration: Nov 16 2009Nov 18 2009

Publication series

Name2009 15th IEEE Pacific Rim International Symposium on Dependable Computing, PRDC 2009

Other

Other2009 15th IEEE Pacific Rim International Symposium on Dependable Computing, PRDC 2009
Country/TerritoryChina
CityShanghai
Period11/16/0911/18/09

Keywords

  • Accelerated fault injection
  • Long latency failures
  • Operating system robustness testing

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Software

Fingerprint

Dive into the research topics of 'Quantitative analysis of long latency failures in system software'. Together they form a unique fingerprint.

Cite this