TY - GEN
T1 - Bobtail
T2 - 10th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2013
AU - Xu, Yunjing
AU - Musgrave, Zachary
AU - Noble, Brian
AU - Bailey, Michael
N1 - Funding Information:
We are grateful to the anonymous reviewers and our shepherd, George Porter, for their comments on this paper. This work was supported in part by the Department of Homeland Security (DHS) under contract numbers D08PC75388, and FA8750-12-2-0314, the National Science Foundation (NSF) under contract numbers CNS 1111699, CNS 091639, CNS 08311174, and CNS 0751116, and the Department of the Navy under contract N000.14-09-1-1042.
Publisher Copyright:
© Proceedings of the 10th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2013. All rights reserved.
PY - 2013
Y1 - 2013
N2 - Highly modular data center applications such as Bing, Facebook, and Amazon's retail platform are known to be susceptible to long tails in response times. Services such as Amazon's EC2 have proven attractive platforms for building similar applications. Unfortunately, virtualization used in such platforms exacerbates the long tail problem by factors of two to four. Surprisingly, we find that poor response times in EC2 are a property of nodes rather than the network, and that this property of nodes is both pervasive throughout EC2 and persistent over time. The root cause of this problem is co-scheduling of CPU-bound and latency-sensitive tasks. We leverage these observations in Bobtail, a system that proactively detects and avoids these bad neighboring VMs without significantly penalizing node instantiation. With Bobtail, common communication patterns benefit from reductions of up to 40% in 99.9th percentile response times.
AB - Highly modular data center applications such as Bing, Facebook, and Amazon's retail platform are known to be susceptible to long tails in response times. Services such as Amazon's EC2 have proven attractive platforms for building similar applications. Unfortunately, virtualization used in such platforms exacerbates the long tail problem by factors of two to four. Surprisingly, we find that poor response times in EC2 are a property of nodes rather than the network, and that this property of nodes is both pervasive throughout EC2 and persistent over time. The root cause of this problem is co-scheduling of CPU-bound and latency-sensitive tasks. We leverage these observations in Bobtail, a system that proactively detects and avoids these bad neighboring VMs without significantly penalizing node instantiation. With Bobtail, common communication patterns benefit from reductions of up to 40% in 99.9th percentile response times.
UR - http://www.scopus.com/inward/record.url?scp=85076715564&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85076715564&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85076715564
T3 - Proceedings of the 10th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2013
SP - 329
EP - 341
BT - Proceedings of the 10th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2013
PB - USENIX Association
Y2 - 2 April 2013 through 5 April 2013
ER -