On low-cost error containment and recovery methods for guarded software upgrading

Ann T. Tai, Kam S. Tso, Leon Alkalai, Savio N. Chau, William H Sanders

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

To assure dependable onboard evolution, we have developed a methodology called guarded software upgrading (GSU). In this paper, we focus on a low-cost approach to error containment and recovery for GSU. To ensure low development cost, we exploit inherent system resource redundancies as the fault tolerance means. In order to mitigate the effect of residual software faults at low performance cost, we take a crucial step in devising error containment and recovery methods by introducing the 'confidence-driven' notion. This notion complements the message-driven (or 'communication-induced') approach employed by a number of existing checkpointing protocols for tolerating hardware faults. In particular, we discriminate between the individual software components with respect to our confidence in their reliability, and keep track of changes of our confidence (due to knowledge about potential process state contamination) in particular processes. This, in turn, enables the individual processes in the spaceborne distributed system to make decisions locally, at run-time, on whether to establish a checkpoint upon message passing and whether to roll back or roll forward during error recovery. The resulting message-driven confidence-driven approach enables cost-effective checkpointing and cascading-rollback free recovery.

Original languageEnglish (US)
Title of host publicationProceedings - International Conference on Distributed Computing Systems
PublisherIEEE
Pages548-555
Number of pages8
StatePublished - 2000
Externally publishedYes
Event20th International Conference on Distributed Computing Systems (ICDCS 2000) - Taipei, Taiwan
Duration: Apr 10 2000Apr 13 2000

Other

Other20th International Conference on Distributed Computing Systems (ICDCS 2000)
CityTaipei, Taiwan
Period4/10/004/13/00

ASJC Scopus subject areas

  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'On low-cost error containment and recovery methods for guarded software upgrading'. Together they form a unique fingerprint.

Cite this