Fault-Tolerant Design Strategies for High Reliability and Safety

Nitin H. Vaidya, Dhiraj K. Pradhan

Research output: Contribution to journalArticlepeer-review

Abstract

Critical applications require systems with high reliability and safety. Reliability is the probability that the system produces correct output. Safety is defined as the probability that the system output is either correct, or the error in the output is detectable (the assumption being that the system is safe when the error is detected). Systems with high safety ensure that the probability of undetected errors is low. In this paper, several fundamental results related to reliability and safety are analyzed. Modular redundant systems consisting of multiple identical modules and an arbiter are considered. It is shown that for a given level of redundancy, a large number of implementation alternatives exist with varying degree of reliability and safety. Strategies are formulated that achieve a maximal combination of reliability and safety. The effect of increasing the number of modules on system reliability and safety is analyzed. It is shown that when one considers safety in addition to reliability, it does not necessarily help to simply add modules to the system. Specifically, increasing the number of modules by just one does not always improve both reliability and safety. To improve reliability and safety simultaneously, at least two additional modules are required when the outputs of the individual modules do not have any redundant information (e.g., coding for error detection). However, it is shown that if the modules themselves have built-in error detection capability, addition of just one module may be sufficient to improve both reliability and safety.

Original languageEnglish (US)
Pages (from-to)1195-1206
Number of pages12
JournalIEEE Transactions on Computers
Volume42
Issue number10
DOIs
StatePublished - Oct 1993
Externally publishedYes

Keywords

  • Fault tolerance
  • maximal schemes
  • modular redundancy
  • reliability
  • safety
  • trade-offs

ASJC Scopus subject areas

  • Software
  • Theoretical Computer Science
  • Hardware and Architecture
  • Computational Theory and Mathematics

Fingerprint

Dive into the research topics of 'Fault-Tolerant Design Strategies for High Reliability and Safety'. Together they form a unique fingerprint.

Cite this