TY - JOUR
T1 - Hierarchical error detection in a software implemented fault tolerance (SIFT) environment
AU - Bagchi, Saurabh
AU - Srinivasan, Balaji
AU - Whisnant, Keith
AU - Kalbarczyk, Zbigniew
AU - Iyer, Ravishankar K.
N1 - Funding Information:
This work was supported in part by the Jet Propulsion Laboratory-NASA under contract JPL961345, in part by the US National Science Foundation under contract NSFCCR99-02026, and by a grant from Motorola Corporation. We thank Fran Baker for making many useful comments on the earlier versions of this manuscrpt.
PY - 2000
Y1 - 2000
N2 - This paper proposes a hierarchical error detection framework for a Software Implemented Fault Tolerance (SIFT) layer of a distributed system. A four-level error detection hierarchy is proposed in the context of Chameleon, a software environment for providing adaptive fault-tolerance in an environment of commercial off-the-shelf (COTS) system components and software. The design and implementation of a software-based distributed signature monitoring scheme, which is central to the proposed four-level hierarchy, is described. Both intralevel and interlevel optimizations that minimize the overhead of detection and are capable of adapting to runtime requirements are proposed. The paper presents results from a prototype implementation of two levels of the error detection hierarchy and results of a detailed simulation of the overall environment. The results indicate a substantial increase in availability due to the detection framework and help in understanding the trade-offs between overhead and coverage for different combinations of techniques.
AB - This paper proposes a hierarchical error detection framework for a Software Implemented Fault Tolerance (SIFT) layer of a distributed system. A four-level error detection hierarchy is proposed in the context of Chameleon, a software environment for providing adaptive fault-tolerance in an environment of commercial off-the-shelf (COTS) system components and software. The design and implementation of a software-based distributed signature monitoring scheme, which is central to the proposed four-level hierarchy, is described. Both intralevel and interlevel optimizations that minimize the overhead of detection and are capable of adapting to runtime requirements are proposed. The paper presents results from a prototype implementation of two levels of the error detection hierarchy and results of a detailed simulation of the overall environment. The results indicate a substantial increase in availability due to the detection framework and help in understanding the trade-offs between overhead and coverage for different combinations of techniques.
UR - http://www.scopus.com/inward/record.url?scp=0033726727&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0033726727&partnerID=8YFLogxK
U2 - 10.1109/69.842263
DO - 10.1109/69.842263
M3 - Article
AN - SCOPUS:0033726727
SN - 1041-4347
VL - 12
SP - 203
EP - 224
JO - IEEE Transactions on Knowledge and Data Engineering
JF - IEEE Transactions on Knowledge and Data Engineering
IS - 2
ER -