An adaptive algorithm for tolerating value faults and crash failures

Yansong Ren, Michel Cukier, William H. Sanders

Research output: Contribution to journalArticlepeer-review

Abstract

The AQuA architecture provides adaptive fault tolerance to CORBA applications by replicating objects and providing a high-level method that an application can use to specify its desired level of dependability. This paper presents the algorithms that AQuA uses, when an application's dependability requirements can change at runtime, to tolerate both value faults in applications and crash failures simultaneously. In particular, we provide an active replication communication scheme that maintains data consistency among replicas, detects crash failures, collates the messages generated by replicated objects, and delivers the result of each vote. We also present an adaptive majority voting algorithm that enables the correct ongoing vote while both the number of replicas and the majority size dynamically change. Together, these two algorithms form the basis of the mechanism for tolerating and recovering from value faults and crash failures in AQuA.

Original languageEnglish (US)
Pages (from-to)173-191
Number of pages19
JournalIEEE Transactions on Parallel and Distributed Systems
Volume12
Issue number2
DOIs
StatePublished - Feb 2001

Keywords

  • Adaptive fault tolerance
  • CORBA
  • Dependable distributed systems
  • Group communication systems
  • Replication protocols

ASJC Scopus subject areas

  • Signal Processing
  • Hardware and Architecture
  • Computational Theory and Mathematics

Fingerprint

Dive into the research topics of 'An adaptive algorithm for tolerating value faults and crash failures'. Together they form a unique fingerprint.

Cite this