TY - JOUR
T1 - AVMON
T2 - Optimal and scalable discovery of consistent availability monitoring overlays for distributed systems
AU - Morales, Ramsés
AU - Gupta, Indranil
N1 - Funding Information:
This paper is an extended version of our previous conference paper at ICDCS 2007 [1]. This work is supported in part by US National Science Foundation (NSF) CAREER Grant CNS-0448246 and in part by NSF ITR Grant CMS-0427089.
PY - 2009
Y1 - 2009
N2 - This paper proposes to build overlays that help in monitoring of long-term availability histories of hosts, with a focus on large-scale distributed settings where hosts may be selfish or colluding. Concretely, we target the important problems of selection and discovery of an availability monitoring overlay. We motivate six significant goals - firstly, consistency, verifiability, and randomness, in selecting availability monitors of nodes, so as to be probabilistically resilient to selfish and colluding nodes. The next three goals are discoverability, load-balancing, and scalability in finding these monitors. We present AVMON, the first availability monitoring overlay to satisfy these six requirements. Our core algorithmic contribution is a range of protocols for discovering the availability monitoring overlay scalably and efficiently, given any arbitrary monitor selection scheme that is consistent and verifiable. We mathematically analyze the performance of AVMON's discovery protocols w.r.t. scalability and discovery time of monitors. Most interestingly, we are able to derive optimal (and practical) variants of AVMON, that minimize different combinations of memory, bandwidth, computation, and monitor discovery time. Finally, our extensive experimental evaluations using three types of availability traces - synthetic, from PlanetLab, and from the a peer-to-peer system (Overnet). Our results demonstrate AVMON's practicality in a variety of distributed systems.
AB - This paper proposes to build overlays that help in monitoring of long-term availability histories of hosts, with a focus on large-scale distributed settings where hosts may be selfish or colluding. Concretely, we target the important problems of selection and discovery of an availability monitoring overlay. We motivate six significant goals - firstly, consistency, verifiability, and randomness, in selecting availability monitors of nodes, so as to be probabilistically resilient to selfish and colluding nodes. The next three goals are discoverability, load-balancing, and scalability in finding these monitors. We present AVMON, the first availability monitoring overlay to satisfy these six requirements. Our core algorithmic contribution is a range of protocols for discovering the availability monitoring overlay scalably and efficiently, given any arbitrary monitor selection scheme that is consistent and verifiable. We mathematically analyze the performance of AVMON's discovery protocols w.r.t. scalability and discovery time of monitors. Most interestingly, we are able to derive optimal (and practical) variants of AVMON, that minimize different combinations of memory, bandwidth, computation, and monitor discovery time. Finally, our extensive experimental evaluations using three types of availability traces - synthetic, from PlanetLab, and from the a peer-to-peer system (Overnet). Our results demonstrate AVMON's practicality in a variety of distributed systems.
KW - Availability
KW - Churn
KW - Consistency
KW - Distributed systems
KW - Monitoring
KW - Optimality
KW - Overlay
KW - Scalability
UR - http://www.scopus.com/inward/record.url?scp=67649882733&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=67649882733&partnerID=8YFLogxK
U2 - 10.1109/TPDS.2008.84
DO - 10.1109/TPDS.2008.84
M3 - Article
AN - SCOPUS:67649882733
SN - 1045-9219
VL - 20
SP - 446
EP - 459
JO - IEEE Transactions on Parallel and Distributed Systems
JF - IEEE Transactions on Parallel and Distributed Systems
IS - 4
ER -