Abstract

This paper presents a systematic methodology to investigate the dependability of operational software. The methodology combines several techniques. Time series analysis is used to characterize the occurrence of software failures. Markov reward modeling is used to determine the loss in service due to failures of software components, and to identify major bottlenecks. The effectiveness of built-in fault tolerance is also evaluated. The methodology is illustrated using the software halt data from the Tandem GUARDIAN operating system. The results show that the occurrences of software halts are not correlated with each other in time. Interrupt handling and memory management are found to be the major bottlenecks in the measured system. The fault tolerance in the measured system was shown to reduce the service loss by nearly 90%.

Original languageEnglish (US)
Article number285887
Pages (from-to)227-236
Number of pages10
JournalProceedings - International Symposium on Software Reliability Engineering, ISSRE
DOIs
StatePublished - Jan 1 1992
Event3rd International Symposium on Software Reliability Engineering, ISSRE 1992 - Research Triangle Park, United States
Duration: Oct 7 1992Oct 10 1992

Fingerprint

Fault tolerance
Time series analysis
Data storage equipment

ASJC Scopus subject areas

  • Software
  • Safety, Risk, Reliability and Quality

Cite this

@article{acb734494f414127acb49ba5127e53c1,
title = "Analysis of software halts in the Tandem GUARDIAN operating system",
abstract = "This paper presents a systematic methodology to investigate the dependability of operational software. The methodology combines several techniques. Time series analysis is used to characterize the occurrence of software failures. Markov reward modeling is used to determine the loss in service due to failures of software components, and to identify major bottlenecks. The effectiveness of built-in fault tolerance is also evaluated. The methodology is illustrated using the software halt data from the Tandem GUARDIAN operating system. The results show that the occurrences of software halts are not correlated with each other in time. Interrupt handling and memory management are found to be the major bottlenecks in the measured system. The fault tolerance in the measured system was shown to reduce the service loss by nearly 90{\%}.",
author = "Inhwan Lee and Iyer, {Ravishankar K}",
year = "1992",
month = "1",
day = "1",
doi = "10.1109/ISSRE.1992.285887",
language = "English (US)",
pages = "227--236",
journal = "Proceedings of the International Symposium on Software Reliability Engineering, ISSRE",
issn = "1071-9458",

}

TY - JOUR

T1 - Analysis of software halts in the Tandem GUARDIAN operating system

AU - Lee, Inhwan

AU - Iyer, Ravishankar K

PY - 1992/1/1

Y1 - 1992/1/1

N2 - This paper presents a systematic methodology to investigate the dependability of operational software. The methodology combines several techniques. Time series analysis is used to characterize the occurrence of software failures. Markov reward modeling is used to determine the loss in service due to failures of software components, and to identify major bottlenecks. The effectiveness of built-in fault tolerance is also evaluated. The methodology is illustrated using the software halt data from the Tandem GUARDIAN operating system. The results show that the occurrences of software halts are not correlated with each other in time. Interrupt handling and memory management are found to be the major bottlenecks in the measured system. The fault tolerance in the measured system was shown to reduce the service loss by nearly 90%.

AB - This paper presents a systematic methodology to investigate the dependability of operational software. The methodology combines several techniques. Time series analysis is used to characterize the occurrence of software failures. Markov reward modeling is used to determine the loss in service due to failures of software components, and to identify major bottlenecks. The effectiveness of built-in fault tolerance is also evaluated. The methodology is illustrated using the software halt data from the Tandem GUARDIAN operating system. The results show that the occurrences of software halts are not correlated with each other in time. Interrupt handling and memory management are found to be the major bottlenecks in the measured system. The fault tolerance in the measured system was shown to reduce the service loss by nearly 90%.

UR - http://www.scopus.com/inward/record.url?scp=33645395626&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33645395626&partnerID=8YFLogxK

U2 - 10.1109/ISSRE.1992.285887

DO - 10.1109/ISSRE.1992.285887

M3 - Conference article

AN - SCOPUS:33645395626

SP - 227

EP - 236

JO - Proceedings of the International Symposium on Software Reliability Engineering, ISSRE

JF - Proceedings of the International Symposium on Software Reliability Engineering, ISSRE

SN - 1071-9458

M1 - 285887

ER -