Measurement and Modeling of Computer Reliability as Affected by System Activity

R. K. Iyer, D. J. Rossetti, M. C. Hsueh

Research output: Contribution to journalArticlepeer-review

Abstract

This paper demonstrates a practical approach to the study of the failure behavior of computer systems. Particular attention is devoted to the analysis of permanent failures. A number of important techniques, which may have general applicability in both failure and workload analysis, are brought together in this presentation. These include: smeared averaging of the workload data, clustering of like failures, and joint analysis of workload and failures. Approximately 17 percent of all failures affecting the CPU were estimated to be permanent. The manifestation of a permanent failure was found to be strongly correlated with the level and type of workload prior to the failure. Although, in strict terms, the results only relate to the manifestation of permanent failures and not to their occurrence, there are strong indications that permanent failures are both caused and discovered by increased activity. More measurements and experiments are necessary to determine their respective contributions to the measured workload/failure relationship.

Original languageEnglish (US)
Pages (from-to)214-237
Number of pages24
JournalACM Transactions on Computer Systems (TOCS)
Volume4
Issue number3
DOIs
StatePublished - Aug 1 1986

Keywords

  • Data analysis
  • failure measurement
  • system activity
  • workload measurement

ASJC Scopus subject areas

  • Computer Science(all)

Fingerprint

Dive into the research topics of 'Measurement and Modeling of Computer Reliability as Affected by System Activity'. Together they form a unique fingerprint.

Cite this