Abstract

The paper presents the rationale for a functional simulation tool, called DEPEND, which provides an integrated design and fault injection environment for system level dependability analysis. The paper discusses the issues and problems of developing such a tool, and describes how DEPEND tackles them. Techniques developed to simulate realistic fault scenarios, reduce simulation time explosion, and handle the large fault model and component domain associated with system level analysis are presented. Examples are used to motivate and illustrate the benefits of this tool. To further illustrate its capabilities, DEPEND is used to simulate the Unix-based Tandem triple-modular-redundancy (TMR) based prototype fault-tolerant system and evaluate how well it handles near-coincident errors caused by correlated and latent faults. Issues such as memory scrubbing, re-integration policies, and workload dependent repair times, which affect how the system handles near-coincident errors, are also evaluated. Unlike any other simulation-based dependability studies, the accuracy of the simulation model is validated by comparing the results of the simulations with measurements obtained from fault injection experiments conducted on a production Tandem machine.

Original languageEnglish (US)
Pages (from-to)60-74
Number of pages15
JournalIEEE Transactions on Computers
Volume46
Issue number1
DOIs
StatePublished - 1997

Keywords

  • Correlated errors
  • Dependability analysis
  • Fault injection
  • Intercomponent dependence
  • Latent errors
  • Object-oriented design
  • Simulation
  • Tandem TMR-based prototype analysis
  • Validation

ASJC Scopus subject areas

  • Software
  • Theoretical Computer Science
  • Hardware and Architecture
  • Computational Theory and Mathematics

Fingerprint

Dive into the research topics of 'DEPEND: A simulation-based environment for system level dependability analysis'. Together they form a unique fingerprint.

Cite this