Abstract

A methodology for automatically detecting symptoms of frequently occurring errors in large computer systems is developed. The proposed symptom recognition methodology and its validation are based on probabilistic techniques. The technique is shown to work on real failure data from two CYBER systems at the University of Illinois. The methodology allows for the resolution between independent and dependent causes and, also quantifies a measure of the strength of relationship among the errors. Comparison made with failure/repair information obtained from field maintenance engineers shows that, in 85% of the cases, the error symptoms recognized by this approach correspond to real system problems. The remaining 15%, although not directly supported by field data, were confirmed as valid problems. Some of these were shown to be persistent problems which otherwise would have been considered as minor transients and hence ignored.

Original languageEnglish (US)
Title of host publicationUnknown Host Publication Title
EditorsHarold S. Stone
PublisherIEEE
Pages797-806
Number of pages10
ISBN (Print)0818607432
StatePublished - 1986

ASJC Scopus subject areas

  • General Engineering

Fingerprint

Dive into the research topics of 'RECOGNITION OF ERROR SYMPTOMS IN LARGE SYSTEMS.'. Together they form a unique fingerprint.

Cite this