Hierarchical error detection in a software implemented fault tolerance (SIFT) environment

Saurabh Bagchi, Balaji Srinivasan, Keith Whisnant, Zbigniew Kalbarczyk, Ravishankar K. Iyer

Research output: Contribution to journalArticlepeer-review

Abstract

This paper proposes a hierarchical error detection framework for a Software Implemented Fault Tolerance (SIFT) layer of a distributed system. A four-level error detection hierarchy is proposed in the context of Chameleon, a software environment for providing adaptive fault-tolerance in an environment of commercial off-the-shelf (COTS) system components and software. The design and implementation of a software-based distributed signature monitoring scheme, which is central to the proposed four-level hierarchy, is described. Both intralevel and interlevel optimizations that minimize the overhead of detection and are capable of adapting to runtime requirements are proposed. The paper presents results from a prototype implementation of two levels of the error detection hierarchy and results of a detailed simulation of the overall environment. The results indicate a substantial increase in availability due to the detection framework and help in understanding the trade-offs between overhead and coverage for different combinations of techniques.

Original languageEnglish (US)
Pages (from-to)203-224
Number of pages22
JournalIEEE Transactions on Knowledge and Data Engineering
Volume12
Issue number2
DOIs
StatePublished - 2000

ASJC Scopus subject areas

  • Information Systems
  • Computer Science Applications
  • Computational Theory and Mathematics

Fingerprint Dive into the research topics of 'Hierarchical error detection in a software implemented fault tolerance (SIFT) environment'. Together they form a unique fingerprint.

Cite this