Analysis of the VAX/VMS error logs in multicomputer environments - A case study of software dependability

Research output: Contribution to journalConference article

Abstract

This paper presents an analysis of the software error logs produced by the VAX/VMS operating system from two VAXcluster multicomputer environments. Basic error characteristics are identified by statistical analysis. Correlations between software and hardware errors, and among software errors on different machines are investigated. Finally, reward analysis and reliability growth analysis are performed to evaluate software dependability. Results show that major software problems in the measured systems are from program flow control and I/O management. The network-related software is suspected to be a reliability bottleneck. It is shown that a multicomputer software Time Between Error distribution can be modeled by a 2-phase hyperexponential random variable: a lower error rate pattern which characterizes regular errors, and a higher error rate pattern which characterizes error bursts and concurrent errors on multiple machines. The VAX/VMS software reliability growth, during the measured period of more than three years and under the workloads running on the measured systems, can be modeled by a scaled exponential function which takes hardware-induced software failures into account.

Original languageEnglish (US)
Article number285886
Pages (from-to)216-226
Number of pages11
JournalProceedings - International Symposium on Software Reliability Engineering, ISSRE
DOIs
StatePublished - Jan 1 1992
Event3rd International Symposium on Software Reliability Engineering, ISSRE 1992 - Research Triangle Park, United States
Duration: Oct 7 1992Oct 10 1992

Fingerprint

Hardware
Software reliability
Exponential functions
Random variables
Flow control
Statistical methods

ASJC Scopus subject areas

  • Software
  • Safety, Risk, Reliability and Quality

Cite this

@article{ac6d7fca0b8c4325a12676e099706f52,
title = "Analysis of the VAX/VMS error logs in multicomputer environments - A case study of software dependability",
abstract = "This paper presents an analysis of the software error logs produced by the VAX/VMS operating system from two VAXcluster multicomputer environments. Basic error characteristics are identified by statistical analysis. Correlations between software and hardware errors, and among software errors on different machines are investigated. Finally, reward analysis and reliability growth analysis are performed to evaluate software dependability. Results show that major software problems in the measured systems are from program flow control and I/O management. The network-related software is suspected to be a reliability bottleneck. It is shown that a multicomputer software Time Between Error distribution can be modeled by a 2-phase hyperexponential random variable: a lower error rate pattern which characterizes regular errors, and a higher error rate pattern which characterizes error bursts and concurrent errors on multiple machines. The VAX/VMS software reliability growth, during the measured period of more than three years and under the workloads running on the measured systems, can be modeled by a scaled exponential function which takes hardware-induced software failures into account.",
author = "Dong Tang and Iyer, {Ravishankar K}",
year = "1992",
month = "1",
day = "1",
doi = "10.1109/ISSRE.1992.285886",
language = "English (US)",
pages = "216--226",
journal = "Proceedings of the International Symposium on Software Reliability Engineering, ISSRE",
issn = "1071-9458",

}

TY - JOUR

T1 - Analysis of the VAX/VMS error logs in multicomputer environments - A case study of software dependability

AU - Tang, Dong

AU - Iyer, Ravishankar K

PY - 1992/1/1

Y1 - 1992/1/1

N2 - This paper presents an analysis of the software error logs produced by the VAX/VMS operating system from two VAXcluster multicomputer environments. Basic error characteristics are identified by statistical analysis. Correlations between software and hardware errors, and among software errors on different machines are investigated. Finally, reward analysis and reliability growth analysis are performed to evaluate software dependability. Results show that major software problems in the measured systems are from program flow control and I/O management. The network-related software is suspected to be a reliability bottleneck. It is shown that a multicomputer software Time Between Error distribution can be modeled by a 2-phase hyperexponential random variable: a lower error rate pattern which characterizes regular errors, and a higher error rate pattern which characterizes error bursts and concurrent errors on multiple machines. The VAX/VMS software reliability growth, during the measured period of more than three years and under the workloads running on the measured systems, can be modeled by a scaled exponential function which takes hardware-induced software failures into account.

AB - This paper presents an analysis of the software error logs produced by the VAX/VMS operating system from two VAXcluster multicomputer environments. Basic error characteristics are identified by statistical analysis. Correlations between software and hardware errors, and among software errors on different machines are investigated. Finally, reward analysis and reliability growth analysis are performed to evaluate software dependability. Results show that major software problems in the measured systems are from program flow control and I/O management. The network-related software is suspected to be a reliability bottleneck. It is shown that a multicomputer software Time Between Error distribution can be modeled by a 2-phase hyperexponential random variable: a lower error rate pattern which characterizes regular errors, and a higher error rate pattern which characterizes error bursts and concurrent errors on multiple machines. The VAX/VMS software reliability growth, during the measured period of more than three years and under the workloads running on the measured systems, can be modeled by a scaled exponential function which takes hardware-induced software failures into account.

UR - http://www.scopus.com/inward/record.url?scp=33747379884&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33747379884&partnerID=8YFLogxK

U2 - 10.1109/ISSRE.1992.285886

DO - 10.1109/ISSRE.1992.285886

M3 - Conference article

AN - SCOPUS:33747379884

SP - 216

EP - 226

JO - Proceedings of the International Symposium on Software Reliability Engineering, ISSRE

JF - Proceedings of the International Symposium on Software Reliability Engineering, ISSRE

SN - 1071-9458

M1 - 285886

ER -