Abstract

Group communication protocols constitute a basic building block for highly dependable distributed applications. Designing and correctly implementing a group communication system (GCS) is a difficult task. While many theoretical algorithms have been formalized and proved for correctness, only few research projects have experimentally assessed the dependability of GCS implementations under complex error scenarios. This paper describes a thorough error-injection experimental campaign conducted on Ensemble, a popular GCS. By employing synthetic benchmark applications, we stress selected components of the GCS - the group membership service, the FIFO-ordered reliable multicast, and the sequencer-based, total-ordered reliable multicast - under various error models, including errors in the memory (text and heap segments) and in the network messages. The data show that about 5-6% of the failures are due to an error escaping Ensemble 's error-containment mechanism and manifesting as a fail silence violation. This constitutes an impediment to achieving, high dependability, the natural objective of GCSs. Our results are derived for a particular system (Ensemble), and more investigation involving other GCSs is required to generalize the conclusions. Nevertheless, through an accurate analysis of the failure causes and the error propagation patterns, this paper offers insights into the design and the implementation of robust GCSs.

Original languageEnglish (US)
Pages (from-to)35-44
Number of pages10
JournalProceedings of the IEEE Symposium on Reliable Distributed Systems
StatePublished - 2003
Event22nd International Symposium on Reliable Distributed Systems, SRDS 2003 - Florence, Italy
Duration: Oct 6 2003Oct 8 2003

ASJC Scopus subject areas

  • Software
  • Theoretical Computer Science
  • Hardware and Architecture
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Group communication protocols under errors'. Together they form a unique fingerprint.

Cite this