Fighting fire with fire: Using randomized gossip to combat stochastic scalability limits

Indranil Gupta, Kenneth P. Birman, Robbert Van Renesse

Research output: Contribution to journalArticlepeer-review


The mechanisms used to improve the reliability of distributed systems often limit performance and scalability. Focusing on one widely-used definition of reliability, we explore the origins of this phenomenon and conclude that it reflects a tradeoff arising deep within the typical protocol stack. Specifically, we suggest that protocol designs often disregard the high cost of infrequent events. When a distributed system is scaled, both the frequency and the overall cost of such events often grow with the size of the system. This triggers an O(n2) phenomenon, which becomes visible above some threshold sizes. Our findings suggest that it would be more effective to construct large-scale reliable systems where, unlike traditional protocol stacks, lower layers use randomized mechanisms, with probabilistic guarantees, to overcome low-probability events. Reliability and other end-to-end properties are introduced closer to the application. We employ a back-of-the-envelope analysis to quantify this phenomenon for a class of strongly reliable multicast problems. We construct a non-traditional stack, as described above, that implements virtually synchronous multicast. Experimental results reveal that virtual synchrony over a nontraditional, probabilistic stack helps break through the scalability barrier faced by traditional implementations of the protocol.

Original languageEnglish (US)
Pages (from-to)165-184
Number of pages20
JournalQuality and Reliability Engineering International
Issue number3
StatePublished - May 2002
Externally publishedYes


  • Group communication
  • Non-traditional stack
  • Randomized algorithms
  • Reliable multicast
  • Scalability
  • Virtual synchrony

ASJC Scopus subject areas

  • Safety, Risk, Reliability and Quality
  • Management Science and Operations Research


Dive into the research topics of 'Fighting fire with fire: Using randomized gossip to combat stochastic scalability limits'. Together they form a unique fingerprint.

Cite this