Scalable fault-tolerant aggregation in large process groups

Indranil Gupta, Robbert Van Renesse, Kenneth P. Birman

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper discusses fault-tolerant, scalable solutions to the problem of accurately and scalably calculating global aggregate functions in large process groups communicating over unreliable networks. These groups could represent sensors or processes communicating over a network that is either fixed (eg., the Internet) or dynamic (eg., multihop ad-hoc). Group members are prone to failures. The ability to evaluate global aggregate properties (eg., the average of sensor temperature readings) is important for higher-level coordination activities in such large groups. We first define the setting and problem, laying down metrics to evaluate different algorithms for the same. We discuss why the usual approaches to solve this problem are unviable and unscalable over an unreliable network prone to message delivery failures and crash failures. We then propose a technique to impose an abstract hierarchy on such large groups, describing how this hierarchy can be made to mirror the network topology. We discuss several alternatives to use this technique to solve the global aggregate function evaluation problem. Finally we present a protocol based on gossiping that uses this hierarchical technique. We present mathematical analysis and performance results to validate the robustness, efficiency and accuracy of the Hierarchical Gossiping algorithm.

Original languageEnglish (US)
Title of host publicationProceedings of the International Conference on Dependable Systems and Networks
EditorsD.C. Young, D.C. Young
Pages433-442
Number of pages10
DOIs
StatePublished - Dec 1 2001
Externally publishedYes
EventProceedings of the International Conference on Dependable Systems and Networks - Goteborg, Sweden
Duration: Jul 1 2001Jul 4 2001

Publication series

NameProceedings of the International Conference on Dependable Systems and Networks

Other

OtherProceedings of the International Conference on Dependable Systems and Networks
CountrySweden
CityGoteborg
Period7/1/017/4/01

ASJC Scopus subject areas

  • Software
  • Hardware and Architecture
  • Computer Networks and Communications

Fingerprint Dive into the research topics of 'Scalable fault-tolerant aggregation in large process groups'. Together they form a unique fingerprint.

Cite this