In most research on checkpointing and recovery, it has been assumed that the processor halts immediately in response to any internal failure (fail-stop model). This paper presents a recovery scheme (independent checkpointing and message logging) for a multicomputer system consisting of processors having a non-zero error detection latency. Our scheme tolerates bounded error detection latencies, thus, achieving a higher fault coverage. The simulation results show that for typical detection latency values, the recovery overhead is almost independent of the detection latency.
|Original language||English (US)|
|Journal||Proceedings of the International Conference on Parallel Processing|
|State||Published - 1994|
|Event||23rd International Conference on Parallel Processing, ICPP 1994 - Raleigh, NC, United States|
Duration: Aug 15 1994 → Aug 19 1994
ASJC Scopus subject areas
- Hardware and Architecture