Reliable on-demand management operations for large-scale distributed applications

Research output: Contribution to journalConference articlepeer-review


This paper argues for attention to, and proposes a novel direction to solving, instant monitoring and management tasks for large-scale distributed applications running across hundreds of hosts. We present the MON (Management Overlay Networks) approach, which uses a novel concept called on-demand overlays, in order to support instant commands such as queries and software pushes. On-demand overlays are built on-the-fly and probabilistically, by leveraging weakly-consistent gossip-style membership information underneath. Thus, they are lightweight in terms of memory, computation, and bandwidth. We augment on-demand overlays with several notions of application-specified reliability, and show how MON detects and adheres to these. MON is available atop PlanetLab, and we present experimental results. We conclude with a series of promising open problems in this direction.

Original languageEnglish (US)
Pages (from-to)82-88
Number of pages7
JournalOperating Systems Review (ACM)
Issue number5
StatePublished - Oct 1 2007
EventGossip-Based Computer Networking - Leiden, Netherlands
Duration: Dec 1 2006Dec 1 2006


  • Instant commands
  • Monitoring
  • On-demand overlays
  • Reliability

ASJC Scopus subject areas

  • Information Systems
  • Hardware and Architecture
  • Computer Networks and Communications


Dive into the research topics of 'Reliable on-demand management operations for large-scale distributed applications'. Together they form a unique fingerprint.

Cite this