Abstract

This paper presents a distributed fault injection and monitoring environment (DEFINE) as a tool to evaluate system dependability, to investigate fault propagation, and to validate fault-tolerant mechanisms. DEFINE can inject both hardware faults (hardware-induced software errors) and software faults into any process running in a distributed system, either in user mode or in supervisor mode, and monitor the fault impact and propagation in software systems and among machines. It employs two fault injection techniques: (i) using hardware clock interrupts to control the time of fault injection and activation, and (ii) using software traps to inject all the faults except communication faults and memory faults in the data/stack segment. Experiments on six Sun SPARCstations to study the system behavior under faults are conducted to demonstrate the application of DEFINE.

Original languageEnglish (US)
Title of host publicationProceedings of IEEE Workshop on Fault-Tolerant Parallel and Distributed Systems, FTPDS 1994
EditorsDhiraj Pradhan, Dimiter Avresky
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages252-259
Number of pages8
ISBN (Electronic)0818668075, 9780818668074
DOIs
StatePublished - 1994
Event1994 IEEE Workshop on Fault-Tolerant Parallel and Distributed Systems, FTPDS 1994 - College Station, United States
Duration: Jun 12 1994Jun 14 1994

Publication series

NameProceedings of IEEE Workshop on Fault-Tolerant Parallel and Distributed Systems, FTPDS 1994

Conference

Conference1994 IEEE Workshop on Fault-Tolerant Parallel and Distributed Systems, FTPDS 1994
Country/TerritoryUnited States
CityCollege Station
Period6/12/946/14/94

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Hardware and Architecture
  • Safety, Risk, Reliability and Quality

Fingerprint

Dive into the research topics of 'DEFINE: A distributed fault injection and monitoring environment'. Together they form a unique fingerprint.

Cite this