Making Data Clouds Smarter at Keebo: Automated Warehouse Optimization using Data Learning

Barzan Mozafari, Radu Alexandru Burcuta, Alan Cabrera, Andrei Constantin, Derek Francis, David Grömling, Alekh Jindal, Maciej Konkolowicz, Valentin Marian Spac, Yongjoo Park, Russell Razo Carranzo, Nicholas Richardson, Abhishek Roy, Aayushi Srivastava, Isha Tarte, Brian Westphal, Chi Zhang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Data clouds in general, and cloud data warehouses (CDWs) in particular, have lowered the upfront expertise and infrastructure barriers, making it easy for a wider range of users to query large and diverse sources of data. This has made modern data pipelines more complex, harder to optimize, and therefore less resource efficient. As a result, the ongoing cost of data clouds can easily become prohibitively expensive. Further, since CDWs are general-purpose solutions that must serve a wide range of workloads, their out-of-box performance is sub-optimal for any single workload. Data teams therefore spend significant effort manually optimizing their queries and cloud infrastructure to curb costs while achieving reasonable performance. Aside from the opportunity cost of diverting data teams from business goals, manual optimization of millions of constantly changing queries is simply daunting. To the best of our knowledge, Keebo's Warehouse Optimization is the first fully-automated solution capable of making real-time optimization decisions that minimize the CDWs' overall cost while meeting the users' performance goals. Keebo learns from how users and applications interact with their CDW and uses its trained models to automatically optimize the warehouse settings, adjusts its resources (e.g., compute, memory), scale it up or down, suspend or resume it, and also self-correct in real-time based on the impact of its own actions.

Original languageEnglish (US)
Title of host publicationSIGMOD 2023 - Companion of the 2023 ACM/SIGMOD International Conference on Management of Data
PublisherAssociation for Computing Machinery
Pages239-251
Number of pages13
ISBN (Electronic)9781450395076
DOIs
StatePublished - Jun 2023
Externally publishedYes
Event2023 ACM/SIGMOD International Conference on Management of Data, SIGMOD 2023 - Seattle, United States
Duration: Jun 18 2023Jun 23 2023

Publication series

NameProceedings of the ACM SIGMOD International Conference on Management of Data
ISSN (Print)0730-8078

Conference

Conference2023 ACM/SIGMOD International Conference on Management of Data, SIGMOD 2023
Country/TerritoryUnited States
CitySeattle
Period6/18/236/23/23

Keywords

  • cloud data warehouse
  • data learning
  • reinforcement learning
  • warehouse optimization

ASJC Scopus subject areas

  • Software
  • Information Systems

Fingerprint

Dive into the research topics of 'Making Data Clouds Smarter at Keebo: Automated Warehouse Optimization using Data Learning'. Together they form a unique fingerprint.

Cite this