TY - GEN
T1 - Wanalytics
T2 - ACM SIGMOD International Conference on Management of Data, SIGMOD 2015
AU - Vulimiri, Ashish
AU - Curino, Carlo
AU - Godfrey, P. Brighten
AU - Jungblut, Thomas
AU - Karanasos, Konstantinos
AU - Padhye, Jitu
AU - Varghese, George
PY - 2015/5/27
Y1 - 2015/5/27
N2 - Many large organizations collect massive volumes of data each day in a geographically distributed fashion, at data centers around the globe. Despite their geographically diverse origin the data must be processed and analyzed as a whole to extract insight. We call the problem of supporting large-scale geo-distributed analytics Wide-Area Big Data (WABD). To the best of our knowledge, WABD is currently addressed by copying all the data to a central data center where the analytics are run. This approach consumes expensive cross-data center bandwidth and is incompatible with data sovereignty restrictions that are starting to take shape. We instead propose WANalytics, a system that solves the WABD problem by orchestrating distributed query execution and adjusting data replication across data centers in order to minimize bandwidth usage, while respecting sovereignty requirements. WANalytics achieves an up to 360× reduction in data transfer cost when compared to the centralized approach on both real Microsoft production workloads and standard synthetic benchmarks, including TPC-CH and Berkeley Big-Data. In this demonstration, attendees will interact with a live geo-scale multi-data center deployment of WANalytics, allowing them to experience the data transfer reduction our system achieves, and to explore how it dynamically adapts execution strategy in response to changes in the workload and environment.
AB - Many large organizations collect massive volumes of data each day in a geographically distributed fashion, at data centers around the globe. Despite their geographically diverse origin the data must be processed and analyzed as a whole to extract insight. We call the problem of supporting large-scale geo-distributed analytics Wide-Area Big Data (WABD). To the best of our knowledge, WABD is currently addressed by copying all the data to a central data center where the analytics are run. This approach consumes expensive cross-data center bandwidth and is incompatible with data sovereignty restrictions that are starting to take shape. We instead propose WANalytics, a system that solves the WABD problem by orchestrating distributed query execution and adjusting data replication across data centers in order to minimize bandwidth usage, while respecting sovereignty requirements. WANalytics achieves an up to 360× reduction in data transfer cost when compared to the centralized approach on both real Microsoft production workloads and standard synthetic benchmarks, including TPC-CH and Berkeley Big-Data. In this demonstration, attendees will interact with a live geo-scale multi-data center deployment of WANalytics, allowing them to experience the data transfer reduction our system achieves, and to explore how it dynamically adapts execution strategy in response to changes in the workload and environment.
UR - http://www.scopus.com/inward/record.url?scp=84957577244&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84957577244&partnerID=8YFLogxK
U2 - 10.1145/2723372.2735365
DO - 10.1145/2723372.2735365
M3 - Conference contribution
AN - SCOPUS:84957577244
T3 - Proceedings of the ACM SIGMOD International Conference on Management of Data
SP - 1087
EP - 1092
BT - SIGMOD 2015 - Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data
PB - Association for Computing Machinery
Y2 - 31 May 2015 through 4 June 2015
ER -