Recovery in multicomputers with finite error detection latency

P. Krishna, N. H. Vaidya, D. K. Pradhan

Research output: Contribution to journalConference articlepeer-review

Abstract

In most research on checkpointing and recovery, it has been assumed that the processor halts immediately in response to any internal failure (fail-stop model). This paper presents a recovery scheme (independent checkpointing and message logging) for a multicomputer system consisting of processors having a non-zero error detection latency. Our scheme tolerates bounded error detection latencies, thus, achieving a higher fault coverage. The simulation results show that for typical detection latency values, the recovery overhead is almost independent of the detection latency.

Original languageEnglish (US)
Article number5727788
Pages (from-to)II206-II210
JournalProceedings of the International Conference on Parallel Processing
Volume2
DOIs
StatePublished - 1994
Externally publishedYes
Event23rd International Conference on Parallel Processing, ICPP 1994 - Raleigh, NC, United States
Duration: Aug 15 1994Aug 19 1994

ASJC Scopus subject areas

  • Software
  • General Mathematics
  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'Recovery in multicomputers with finite error detection latency'. Together they form a unique fingerprint.

Cite this