Understanding fault tolerance and reliability

Arun K. Somani, Nitin H Vaidya

Research output: Contribution to specialist publicationArticle


A good fault-tolerant system design requires a careful study of design, failures, causes of failures, and system response to failures. Planning to avoid failures is the most important aspect of fault tolerance. A designer must analyze the environment and determine the failures that must be tolerated to achieve the desired level of reliability. To optimize fault tolerance, it is important to estimate actual failure rates for each possible failure. The basic principle of fault-tolerant design is redundancy, and there are three basic techniques to achieve it, namely, spatial (redundant hardware); informational (redundant data structures); and temporal (redundant computation).

Original languageEnglish (US)
Number of pages6
Specialist publicationComputer
StatePublished - Apr 1 1997

ASJC Scopus subject areas

  • Computer Science(all)

Fingerprint Dive into the research topics of 'Understanding fault tolerance and reliability'. Together they form a unique fingerprint.

  • Cite this