Error Recovery in Asynchronous Systems

Roy H. Campbell, Brian Randell

Research output: Contribution to journalArticle

Abstract

The demand for highly reliable computer systems has led to techniques for the construction of fault-tolerant software systems. A fault-tolerant system detects errors created as the effects of a fault and applies error recovery provisions in the form of abnormal or exceptional mechanisms and algorithms to continue operation and restore normal computation. Backward error recovery is intended to restore a system state which occurred prior to the manifestation of the fault. Forward error recovery is intended To correct or isolate specific errors and is accomplished in the system state containing the errors. The organization and control of error recovery in asynchronous systems is very complex. Nevertheless, it is possible to limit this complexity by appropriate system structuring aids. Techniques for structuring backward error recovery are comparatively well understood. This pa-per proposes techniques for structuring forward error recovery mea-sures in asynchronous systems and generalizes recent ideas of atomic actions (transactions) so as to support fault-tolerant interactions be-tween processes.

Original languageEnglish (US)
Pages (from-to)811-826
Number of pages16
JournalIEEE Transactions on Software Engineering
VolumeSE-12
Issue number8
DOIs
StatePublished - Aug 1986

Keywords

  • Asynchronous systems
  • atomic actions
  • error recovery
  • exception mechanism
  • programming techniques
  • software fault tolerance
  • software reliability

ASJC Scopus subject areas

  • Software

Fingerprint Dive into the research topics of 'Error Recovery in Asynchronous Systems'. Together they form a unique fingerprint.

  • Cite this