Abstract
Efficient big data analytics over the wide-area network (WAN) is becoming increasingly more popular. Current geo-distributed analytics (GDA) systems employ WAN-aware optimizations to tackle WAN heterogeneities. Although extensive measurements on public clouds suggest the potential for improving inter-datacenter data transfers via detours, we show that such optimizations are unlikely to work in practice. This is because the widely accepted mantra used in a large body of literature – WAN bandwidth has high variability – can be misleading. Instead, our measurements across 40 datacenters belonging to Amazon EC2, Microsoft Azure, and Google Cloud Platform show that the available WAN bandwidth is often spatially homogeneous and temporally stable between two virtual machines (VMs) in different datacenters, even though it can be heterogeneous at the TCP flow level. Moreover, there is little scope for either bandwidth or latency optimization in a cost-effective manner via relaying. We believe that these findings will motivate the community to rethink the design rationales of GDA systems and geo-distributed services.
Original language | English (US) |
---|---|
State | Published - 2018 |
Externally published | Yes |
Event | 10th USENIX Workshop on Hot Topics in Cloud Computing, HotCloud 2018 - Boston, United States Duration: Jul 9 2018 → … |
Conference
Conference | 10th USENIX Workshop on Hot Topics in Cloud Computing, HotCloud 2018 |
---|---|
Country/Territory | United States |
City | Boston |
Period | 7/9/18 → … |
ASJC Scopus subject areas
- Computer Networks and Communications
- Software