Examining ACE analysis reliability estimates using fault-injection

Nicholas J. Wang, Aqeel Mahesri, Sanjay J. Patel

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

ACE analysis is a technique to provide an early reliability estimate for microprocessors. ACE analysis couples data from abstract performance models with low level design details to identify and rule out transient faults that will not cause incorrect execution. While many transient faults are analyzable in ACE analysis frameworks, some are not. As a result, ACE analysis is conservative and provides a lower bound for the reliability of a processor design. Bounding the reliability of a design is useful since it can guarantee that the given design will meet reliability goals. In this work, we quantify and identify the sources of ACE analysis conservatism by comparing an ACE analysis methodology against a rigorous fault-injection study. We evaluate two flavors of ACE analysis: a "simple" analysis and a refined analysis, finding that even the refined analysis overestimates the soft error vulnerability of an instruction scheduler by 2-3x. The conservatism stems from two key sources: from lack of detail in abstract performance models and from what we term Y-Bits, a result of the single-pass simulation methodology that is typical of ACE analysis. We also examine the efficacy of applying ACE analysis to a class of "partial coverage" error mitigation techniques. In particular, we perform a case study on one such technique and extrapolate our findings to others.

Original languageEnglish (US)
Title of host publicationISCA'07
Subtitle of host publication34th Annual International Symposium on Computer Architecture, Conference Proceedings
Pages460-469
Number of pages10
DOIs
StatePublished - 2007
EventISCA'07: 34th Annual International Symposium on Computer Architecture - San Diego, CA, United States
Duration: Jun 9 2007Jun 13 2007

Publication series

NameProceedings - International Symposium on Computer Architecture
ISSN (Print)1063-6897

Other

OtherISCA'07: 34th Annual International Symposium on Computer Architecture
Country/TerritoryUnited States
CitySan Diego, CA
Period6/9/076/13/07

Keywords

  • Fault tolerance
  • Measurement techniques
  • Microprocessors
  • Soft errors

ASJC Scopus subject areas

  • General Engineering

Fingerprint

Dive into the research topics of 'Examining ACE analysis reliability estimates using fault-injection'. Together they form a unique fingerprint.

Cite this