Using likely program invariants to detect hardware errors

Swamp Kumar Sahoo, Man Lap Li, Pradeep Ramachandran, Sarita V Adve, Vikram Sadanand Adve, Yuanyuan Zhou

Research output: Contribution to conferencePaper

Abstract

In the near future, hardware is expected to become increasingly vulnerable to faults due to continuously decreasing feature size. Software-level symptoms have previously been used to detect permanent hardware faults. However, they can not detect a small fraction of faults, which may lead to Silent Data Corruptions(SDCs). In this paper, we present a system that uses invariants to improve the coverage and latency of existing detection techniques for permanent faults. The basic idea is to use training inputs to create likely invariants based on value ranges of selected program variables and then use them to identify faults at runtime. Likely invariants, however, can have false positives which makes them challenging to use for permanent faults. We use our on-line diagnosis framework for detecting false positives at runtime and limit the number of false positives to keep the associated overhead minimal. Experimental results using microarchitecture level fault injections in full-system simulation show 28.6% reduction in the number of undetected faults and 74.2% reduction in the number of SDCs over existing techniques, with reasonable overhead for checking code.

Original languageEnglish (US)
Pages70-79
Number of pages10
DOIs
StatePublished - Oct 13 2008
Event2008 International Conference on Dependable Systems and Networks, DSN-2008 - Anchorage, AK, United States
Duration: Jun 24 2008Jun 27 2008

Other

Other2008 International Conference on Dependable Systems and Networks, DSN-2008
CountryUnited States
CityAnchorage, AK
Period6/24/086/27/08

Fingerprint

Hardware

ASJC Scopus subject areas

  • Software
  • Hardware and Architecture
  • Computer Networks and Communications

Cite this

Sahoo, S. K., Li, M. L., Ramachandran, P., Adve, S. V., Adve, V. S., & Zhou, Y. (2008). Using likely program invariants to detect hardware errors. 70-79. Paper presented at 2008 International Conference on Dependable Systems and Networks, DSN-2008, Anchorage, AK, United States. https://doi.org/10.1109/DSN.2008.4630072

Using likely program invariants to detect hardware errors. / Sahoo, Swamp Kumar; Li, Man Lap; Ramachandran, Pradeep; Adve, Sarita V; Adve, Vikram Sadanand; Zhou, Yuanyuan.

2008. 70-79 Paper presented at 2008 International Conference on Dependable Systems and Networks, DSN-2008, Anchorage, AK, United States.

Research output: Contribution to conferencePaper

Sahoo, SK, Li, ML, Ramachandran, P, Adve, SV, Adve, VS & Zhou, Y 2008, 'Using likely program invariants to detect hardware errors', Paper presented at 2008 International Conference on Dependable Systems and Networks, DSN-2008, Anchorage, AK, United States, 6/24/08 - 6/27/08 pp. 70-79. https://doi.org/10.1109/DSN.2008.4630072
Sahoo SK, Li ML, Ramachandran P, Adve SV, Adve VS, Zhou Y. Using likely program invariants to detect hardware errors. 2008. Paper presented at 2008 International Conference on Dependable Systems and Networks, DSN-2008, Anchorage, AK, United States. https://doi.org/10.1109/DSN.2008.4630072
Sahoo, Swamp Kumar ; Li, Man Lap ; Ramachandran, Pradeep ; Adve, Sarita V ; Adve, Vikram Sadanand ; Zhou, Yuanyuan. / Using likely program invariants to detect hardware errors. Paper presented at 2008 International Conference on Dependable Systems and Networks, DSN-2008, Anchorage, AK, United States.10 p.
@conference{3273974d580e44a1979fbfd208a1f2e3,
title = "Using likely program invariants to detect hardware errors",
abstract = "In the near future, hardware is expected to become increasingly vulnerable to faults due to continuously decreasing feature size. Software-level symptoms have previously been used to detect permanent hardware faults. However, they can not detect a small fraction of faults, which may lead to Silent Data Corruptions(SDCs). In this paper, we present a system that uses invariants to improve the coverage and latency of existing detection techniques for permanent faults. The basic idea is to use training inputs to create likely invariants based on value ranges of selected program variables and then use them to identify faults at runtime. Likely invariants, however, can have false positives which makes them challenging to use for permanent faults. We use our on-line diagnosis framework for detecting false positives at runtime and limit the number of false positives to keep the associated overhead minimal. Experimental results using microarchitecture level fault injections in full-system simulation show 28.6{\%} reduction in the number of undetected faults and 74.2{\%} reduction in the number of SDCs over existing techniques, with reasonable overhead for checking code.",
author = "Sahoo, {Swamp Kumar} and Li, {Man Lap} and Pradeep Ramachandran and Adve, {Sarita V} and Adve, {Vikram Sadanand} and Yuanyuan Zhou",
year = "2008",
month = "10",
day = "13",
doi = "10.1109/DSN.2008.4630072",
language = "English (US)",
pages = "70--79",
note = "2008 International Conference on Dependable Systems and Networks, DSN-2008 ; Conference date: 24-06-2008 Through 27-06-2008",

}

TY - CONF

T1 - Using likely program invariants to detect hardware errors

AU - Sahoo, Swamp Kumar

AU - Li, Man Lap

AU - Ramachandran, Pradeep

AU - Adve, Sarita V

AU - Adve, Vikram Sadanand

AU - Zhou, Yuanyuan

PY - 2008/10/13

Y1 - 2008/10/13

N2 - In the near future, hardware is expected to become increasingly vulnerable to faults due to continuously decreasing feature size. Software-level symptoms have previously been used to detect permanent hardware faults. However, they can not detect a small fraction of faults, which may lead to Silent Data Corruptions(SDCs). In this paper, we present a system that uses invariants to improve the coverage and latency of existing detection techniques for permanent faults. The basic idea is to use training inputs to create likely invariants based on value ranges of selected program variables and then use them to identify faults at runtime. Likely invariants, however, can have false positives which makes them challenging to use for permanent faults. We use our on-line diagnosis framework for detecting false positives at runtime and limit the number of false positives to keep the associated overhead minimal. Experimental results using microarchitecture level fault injections in full-system simulation show 28.6% reduction in the number of undetected faults and 74.2% reduction in the number of SDCs over existing techniques, with reasonable overhead for checking code.

AB - In the near future, hardware is expected to become increasingly vulnerable to faults due to continuously decreasing feature size. Software-level symptoms have previously been used to detect permanent hardware faults. However, they can not detect a small fraction of faults, which may lead to Silent Data Corruptions(SDCs). In this paper, we present a system that uses invariants to improve the coverage and latency of existing detection techniques for permanent faults. The basic idea is to use training inputs to create likely invariants based on value ranges of selected program variables and then use them to identify faults at runtime. Likely invariants, however, can have false positives which makes them challenging to use for permanent faults. We use our on-line diagnosis framework for detecting false positives at runtime and limit the number of false positives to keep the associated overhead minimal. Experimental results using microarchitecture level fault injections in full-system simulation show 28.6% reduction in the number of undetected faults and 74.2% reduction in the number of SDCs over existing techniques, with reasonable overhead for checking code.

UR - http://www.scopus.com/inward/record.url?scp=53349128424&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=53349128424&partnerID=8YFLogxK

U2 - 10.1109/DSN.2008.4630072

DO - 10.1109/DSN.2008.4630072

M3 - Paper

AN - SCOPUS:53349128424

SP - 70

EP - 79

ER -