Abstract

This paper presents a measurement-based study of software failures and recovery in the Tandem GUARDIAN90 operating system using a collection of memory dump analyses of field software failures. We identify the effects of software faults on the processor state and trace the propagation of the effects to other areas of the system. We also evaluate the role of the defensive programming techniques and the software fault tolerance of the process pair mechanism implemented in the Tandem system. Results show that the Tandem system tolerates nearly 82% of reported field software faults, thus demonstrating the effectiveness of the system against software faults. Consistency checks made by the operating system detect 52% of software problems and prevent any error propagation in 31% of software problems. Results also show that 72% of reported field software failures are recurrences of known software faults and 70% of the recurrence groups have identical characteristics.

Original languageEnglish (US)
Title of host publicationDigest of Papers - International Symposium on Fault-Tolerant Computing
Editors Anon
PublisherPubl by IEEE
Pages20-29
Number of pages10
ISBN (Print)0818636823
StatePublished - 1993
EventProceedings of the 23rd International Symposium on Fault-Tolerant Computing - Toulouse, Fr
Duration: Jun 22 1993Jun 24 1993

Publication series

NameDigest of Papers - International Symposium on Fault-Tolerant Computing
ISSN (Print)0731-3071

Other

OtherProceedings of the 23rd International Symposium on Fault-Tolerant Computing
CityToulouse, Fr
Period6/22/936/24/93

ASJC Scopus subject areas

  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'Faults, symptoms, and software fault tolerance in the Tandem GUARDIAN90 operating system'. Together they form a unique fingerprint.

Cite this