Abstract

Group communication protocols constitute a basic building block for highly dependable distributed applications. Designing and correctly implementing a group communication system (GCS) is a difficult task. While many theoretical algorithms have been formalized and proved for correctness, only few research projects have experimentally assessed the dependability of GCS implementations under complex error scenarios. This paper describes a thorough error-injection experimental campaign conducted on Ensemble, a popular GCS. By employing synthetic benchmark applications, we stress selected components of the GCS - the group membership service, the FIFO-ordered reliable multicast, and the sequencer-based, total-ordered reliable multicast - under various error models, including errors in the memory (text and heap segments) and in the network messages. The data show that about 5-6% of the failures are due to an error escaping Ensemble 's error-containment mechanism and manifesting as a fail silence violation. This constitutes an impediment to achieving, high dependability, the natural objective of GCSs. Our results are derived for a particular system (Ensemble), and more investigation involving other GCSs is required to generalize the conclusions. Nevertheless, through an accurate analysis of the failure causes and the error propagation patterns, this paper offers insights into the design and the implementation of robust GCSs.

Original languageEnglish (US)
Pages (from-to)35-44
Number of pages10
JournalProceedings of the IEEE Symposium on Reliable Distributed Systems
StatePublished - Dec 1 2003
Event22nd International Symposium on Reliable Distributed Systems, SRDS 2003 - Florence, Italy
Duration: Oct 6 2003Oct 8 2003

Fingerprint

Group Communication
Communication Protocol
Communication Systems
Network protocols
Ensemble
Communication systems
Dependability
Multicast
Model Error
Error Propagation
Error Model
Heap
Distributed Applications
Building Blocks
Injection
Correctness
Benchmark
Scenarios
Generalise
Data storage equipment

ASJC Scopus subject areas

  • Software
  • Theoretical Computer Science
  • Hardware and Architecture
  • Computer Networks and Communications

Cite this

Group communication protocols under errors. / Basile, Claudio; Wang, Long; Kalbarczyk, Zbigniew; Iyer, Ravi.

In: Proceedings of the IEEE Symposium on Reliable Distributed Systems, 01.12.2003, p. 35-44.

Research output: Contribution to journalConference article

@article{5e2560fb151e429f9a0f7ae606a57ce5,
title = "Group communication protocols under errors",
abstract = "Group communication protocols constitute a basic building block for highly dependable distributed applications. Designing and correctly implementing a group communication system (GCS) is a difficult task. While many theoretical algorithms have been formalized and proved for correctness, only few research projects have experimentally assessed the dependability of GCS implementations under complex error scenarios. This paper describes a thorough error-injection experimental campaign conducted on Ensemble, a popular GCS. By employing synthetic benchmark applications, we stress selected components of the GCS - the group membership service, the FIFO-ordered reliable multicast, and the sequencer-based, total-ordered reliable multicast - under various error models, including errors in the memory (text and heap segments) and in the network messages. The data show that about 5-6{\%} of the failures are due to an error escaping Ensemble 's error-containment mechanism and manifesting as a fail silence violation. This constitutes an impediment to achieving, high dependability, the natural objective of GCSs. Our results are derived for a particular system (Ensemble), and more investigation involving other GCSs is required to generalize the conclusions. Nevertheless, through an accurate analysis of the failure causes and the error propagation patterns, this paper offers insights into the design and the implementation of robust GCSs.",
author = "Claudio Basile and Long Wang and Zbigniew Kalbarczyk and Ravi Iyer",
year = "2003",
month = "12",
day = "1",
language = "English (US)",
pages = "35--44",
journal = "Proceedings of the IEEE Symposium on Reliable Distributed Systems",
issn = "1060-9857",
publisher = "IEEE Computer Society",

}

TY - JOUR

T1 - Group communication protocols under errors

AU - Basile, Claudio

AU - Wang, Long

AU - Kalbarczyk, Zbigniew

AU - Iyer, Ravi

PY - 2003/12/1

Y1 - 2003/12/1

N2 - Group communication protocols constitute a basic building block for highly dependable distributed applications. Designing and correctly implementing a group communication system (GCS) is a difficult task. While many theoretical algorithms have been formalized and proved for correctness, only few research projects have experimentally assessed the dependability of GCS implementations under complex error scenarios. This paper describes a thorough error-injection experimental campaign conducted on Ensemble, a popular GCS. By employing synthetic benchmark applications, we stress selected components of the GCS - the group membership service, the FIFO-ordered reliable multicast, and the sequencer-based, total-ordered reliable multicast - under various error models, including errors in the memory (text and heap segments) and in the network messages. The data show that about 5-6% of the failures are due to an error escaping Ensemble 's error-containment mechanism and manifesting as a fail silence violation. This constitutes an impediment to achieving, high dependability, the natural objective of GCSs. Our results are derived for a particular system (Ensemble), and more investigation involving other GCSs is required to generalize the conclusions. Nevertheless, through an accurate analysis of the failure causes and the error propagation patterns, this paper offers insights into the design and the implementation of robust GCSs.

AB - Group communication protocols constitute a basic building block for highly dependable distributed applications. Designing and correctly implementing a group communication system (GCS) is a difficult task. While many theoretical algorithms have been formalized and proved for correctness, only few research projects have experimentally assessed the dependability of GCS implementations under complex error scenarios. This paper describes a thorough error-injection experimental campaign conducted on Ensemble, a popular GCS. By employing synthetic benchmark applications, we stress selected components of the GCS - the group membership service, the FIFO-ordered reliable multicast, and the sequencer-based, total-ordered reliable multicast - under various error models, including errors in the memory (text and heap segments) and in the network messages. The data show that about 5-6% of the failures are due to an error escaping Ensemble 's error-containment mechanism and manifesting as a fail silence violation. This constitutes an impediment to achieving, high dependability, the natural objective of GCSs. Our results are derived for a particular system (Ensemble), and more investigation involving other GCSs is required to generalize the conclusions. Nevertheless, through an accurate analysis of the failure causes and the error propagation patterns, this paper offers insights into the design and the implementation of robust GCSs.

UR - http://www.scopus.com/inward/record.url?scp=17644413377&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=17644413377&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:17644413377

SP - 35

EP - 44

JO - Proceedings of the IEEE Symposium on Reliable Distributed Systems

JF - Proceedings of the IEEE Symposium on Reliable Distributed Systems

SN - 1060-9857

ER -