A good fault-tolerant system design requires a careful study of design, failures, causes of failures, and system response to failures. Planning to avoid failures is the most important aspect of fault tolerance. A designer must analyze the environment and determine the failures that must be tolerated to achieve the desired level of reliability. To optimize fault tolerance, it is important to estimate actual failure rates for each possible failure. The basic principle of fault-tolerant design is redundancy, and there are three basic techniques to achieve it, namely, spatial (redundant hardware); informational (redundant data structures); and temporal (redundant computation).
ASJC Scopus subject areas
- Computer Science(all)