TY - GEN
T1 - Examining ACE analysis reliability estimates using fault-injection
AU - Wang, Nicholas J.
AU - Mahesri, Aqeel
AU - Patel, Sanjay J.
PY - 2007
Y1 - 2007
N2 - ACE analysis is a technique to provide an early reliability estimate for microprocessors. ACE analysis couples data from abstract performance models with low level design details to identify and rule out transient faults that will not cause incorrect execution. While many transient faults are analyzable in ACE analysis frameworks, some are not. As a result, ACE analysis is conservative and provides a lower bound for the reliability of a processor design. Bounding the reliability of a design is useful since it can guarantee that the given design will meet reliability goals. In this work, we quantify and identify the sources of ACE analysis conservatism by comparing an ACE analysis methodology against a rigorous fault-injection study. We evaluate two flavors of ACE analysis: a "simple" analysis and a refined analysis, finding that even the refined analysis overestimates the soft error vulnerability of an instruction scheduler by 2-3x. The conservatism stems from two key sources: from lack of detail in abstract performance models and from what we term Y-Bits, a result of the single-pass simulation methodology that is typical of ACE analysis. We also examine the efficacy of applying ACE analysis to a class of "partial coverage" error mitigation techniques. In particular, we perform a case study on one such technique and extrapolate our findings to others.
AB - ACE analysis is a technique to provide an early reliability estimate for microprocessors. ACE analysis couples data from abstract performance models with low level design details to identify and rule out transient faults that will not cause incorrect execution. While many transient faults are analyzable in ACE analysis frameworks, some are not. As a result, ACE analysis is conservative and provides a lower bound for the reliability of a processor design. Bounding the reliability of a design is useful since it can guarantee that the given design will meet reliability goals. In this work, we quantify and identify the sources of ACE analysis conservatism by comparing an ACE analysis methodology against a rigorous fault-injection study. We evaluate two flavors of ACE analysis: a "simple" analysis and a refined analysis, finding that even the refined analysis overestimates the soft error vulnerability of an instruction scheduler by 2-3x. The conservatism stems from two key sources: from lack of detail in abstract performance models and from what we term Y-Bits, a result of the single-pass simulation methodology that is typical of ACE analysis. We also examine the efficacy of applying ACE analysis to a class of "partial coverage" error mitigation techniques. In particular, we perform a case study on one such technique and extrapolate our findings to others.
KW - Fault tolerance
KW - Measurement techniques
KW - Microprocessors
KW - Soft errors
UR - http://www.scopus.com/inward/record.url?scp=35348921109&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=35348921109&partnerID=8YFLogxK
U2 - 10.1145/1250662.1250719
DO - 10.1145/1250662.1250719
M3 - Conference contribution
AN - SCOPUS:35348921109
SN - 1595937064
SN - 9781595937063
T3 - Proceedings - International Symposium on Computer Architecture
SP - 460
EP - 469
BT - ISCA'07
T2 - ISCA'07: 34th Annual International Symposium on Computer Architecture
Y2 - 9 June 2007 through 13 June 2007
ER -