Abstract
In most research on checkpointing and recovery, it has been assumed that the processor halts immediately in response to any internal failure (fail-stop model). This paper presents a recovery scheme (independent checkpointing and message logging) for a multicomputer system consisting of processors having a non-zero error detection latency. Our scheme tolerates bounded error detection latencies, thus, achieving a higher fault coverage. The simulation results show that for typical detection latency values, the recovery overhead is almost independent of the detection latency.
Original language | English (US) |
---|---|
Article number | 5727788 |
Pages (from-to) | II206-II210 |
Journal | Proceedings of the International Conference on Parallel Processing |
Volume | 2 |
DOIs | |
State | Published - 1994 |
Externally published | Yes |
Event | 23rd International Conference on Parallel Processing, ICPP 1994 - Raleigh, NC, United States Duration: Aug 15 1994 → Aug 19 1994 |
ASJC Scopus subject areas
- Software
- General Mathematics
- Hardware and Architecture