Efficient software checking for fault tolerance

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Dramatic increases in the number of transistors that can be integrated on a chip make processors more susceptible to radiation-induced transient errors. For commodity chips which are cost- and energy-constrained, software approaches can play a major role for fault detection because they can be tailored to fit different requirements of reliability and performance. However, software approaches add a significant performance overhead because they replicate the instructions and add checking instructions to compare the results. In order to make software checking approaches more attractive, we use compiler techniqes to identify the "unnecessary" replicas and checking instructions. In this paper, we present three techniques. The first technique uses boolean logic to identify code patterns that correspond to outcome tolerant branches. The second technique identifies address checks before loads and stores that can be removed with different degrees of fault coverage. The third technique identifies the checking instructions and shadow registers that are unnecessary when the register file is protected in hardware. By combining the three techniques, the overheads of software approaches can be reduced by an average 50%.

Original languageEnglish (US)
Title of host publicationIPDPS Miami 2008 - Proceedings of the 22nd IEEE International Parallel and Distributed Processing Symposium, Program and CD-ROM
DOIs
StatePublished - 2008
EventIPDPS 2008 - 22nd IEEE International Parallel and Distributed Processing Symposium - Miami, FL, United States
Duration: Apr 14 2008Apr 18 2008

Publication series

NameIPDPS Miami 2008 - Proceedings of the 22nd IEEE International Parallel and Distributed Processing Symposium, Program and CD-ROM

Other

OtherIPDPS 2008 - 22nd IEEE International Parallel and Distributed Processing Symposium
CountryUnited States
CityMiami, FL
Period4/14/084/18/08

ASJC Scopus subject areas

  • Hardware and Architecture
  • Software
  • Electrical and Electronic Engineering

Fingerprint Dive into the research topics of 'Efficient software checking for fault tolerance'. Together they form a unique fingerprint.

Cite this