Global analytics in the face of bandwidth and regulatory constraints

Ashish Vulimiri, Carlo Curino, Brighten Godfrey, Thomas Jungblut, Jitu Padhye, George Varghese

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Global-scale organizations produce large volumes of data across geographically distributed data centers. Querying and analyzing such data as a whole introduces new research issues at the intersection of networks and databases. Today systems that compute SQL analytics over geographically distributed data operate by pulling all data to a central location. This is problematic at large data scales due to expensive transoceanic links, and may be rendered impossible by emerging regulatory constraints. The new problem of Wide-Area Big Data (WABD) consists in orchestrating query execution across data centers to minimize bandwidth while respecting regulatory constaints. WABD combines classical query planning with novel network-centric mechanisms designed for a wide-area setting such as pseudo-distributed execution, joint query optimization, and deltas on cached subquery results. Our prototype, Geode, builds upon Hive and uses 250× less bandwidth than centralized analytics in a Microsoft production workload and up to 360× less on popular analytics benchmarks including TPC-CH and Berkeley Big Data. Geode supports all SQL operators, including Joins, across global data.

Original languageEnglish (US)
Title of host publicationProceedings of the 12th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2015
PublisherUSENIX
Pages323-336
Number of pages14
ISBN (Electronic)9781931971218
StatePublished - 2015
Event12th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2015 - Oakland, United States
Duration: May 4 2015May 6 2015

Publication series

NameProceedings of the 12th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2015

Other

Other12th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2015
Country/TerritoryUnited States
CityOakland
Period5/4/155/6/15

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Global analytics in the face of bandwidth and regulatory constraints'. Together they form a unique fingerprint.

Cite this