TY - GEN
T1 - Making Data Clouds Smarter at Keebo
T2 - 2023 ACM/SIGMOD International Conference on Management of Data, SIGMOD 2023
AU - Mozafari, Barzan
AU - Burcuta, Radu Alexandru
AU - Cabrera, Alan
AU - Constantin, Andrei
AU - Francis, Derek
AU - Grömling, David
AU - Jindal, Alekh
AU - Konkolowicz, Maciej
AU - Marian Spac, Valentin
AU - Park, Yongjoo
AU - Carranzo, Russell Razo
AU - Richardson, Nicholas
AU - Roy, Abhishek
AU - Srivastava, Aayushi
AU - Tarte, Isha
AU - Westphal, Brian
AU - Zhang, Chi
N1 - Publisher Copyright:
© 2023 Owner/Author.
PY - 2023/6
Y1 - 2023/6
N2 - Data clouds in general, and cloud data warehouses (CDWs) in particular, have lowered the upfront expertise and infrastructure barriers, making it easy for a wider range of users to query large and diverse sources of data. This has made modern data pipelines more complex, harder to optimize, and therefore less resource efficient. As a result, the ongoing cost of data clouds can easily become prohibitively expensive. Further, since CDWs are general-purpose solutions that must serve a wide range of workloads, their out-of-box performance is sub-optimal for any single workload. Data teams therefore spend significant effort manually optimizing their queries and cloud infrastructure to curb costs while achieving reasonable performance. Aside from the opportunity cost of diverting data teams from business goals, manual optimization of millions of constantly changing queries is simply daunting. To the best of our knowledge, Keebo's Warehouse Optimization is the first fully-automated solution capable of making real-time optimization decisions that minimize the CDWs' overall cost while meeting the users' performance goals. Keebo learns from how users and applications interact with their CDW and uses its trained models to automatically optimize the warehouse settings, adjusts its resources (e.g., compute, memory), scale it up or down, suspend or resume it, and also self-correct in real-time based on the impact of its own actions.
AB - Data clouds in general, and cloud data warehouses (CDWs) in particular, have lowered the upfront expertise and infrastructure barriers, making it easy for a wider range of users to query large and diverse sources of data. This has made modern data pipelines more complex, harder to optimize, and therefore less resource efficient. As a result, the ongoing cost of data clouds can easily become prohibitively expensive. Further, since CDWs are general-purpose solutions that must serve a wide range of workloads, their out-of-box performance is sub-optimal for any single workload. Data teams therefore spend significant effort manually optimizing their queries and cloud infrastructure to curb costs while achieving reasonable performance. Aside from the opportunity cost of diverting data teams from business goals, manual optimization of millions of constantly changing queries is simply daunting. To the best of our knowledge, Keebo's Warehouse Optimization is the first fully-automated solution capable of making real-time optimization decisions that minimize the CDWs' overall cost while meeting the users' performance goals. Keebo learns from how users and applications interact with their CDW and uses its trained models to automatically optimize the warehouse settings, adjusts its resources (e.g., compute, memory), scale it up or down, suspend or resume it, and also self-correct in real-time based on the impact of its own actions.
KW - cloud data warehouse
KW - data learning
KW - reinforcement learning
KW - warehouse optimization
UR - http://www.scopus.com/inward/record.url?scp=85162869900&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85162869900&partnerID=8YFLogxK
U2 - 10.1145/3555041.3589681
DO - 10.1145/3555041.3589681
M3 - Conference contribution
AN - SCOPUS:85162869900
T3 - Proceedings of the ACM SIGMOD International Conference on Management of Data
SP - 239
EP - 251
BT - SIGMOD 2023 - Companion of the 2023 ACM/SIGMOD International Conference on Management of Data
PB - Association for Computing Machinery
Y2 - 18 June 2023 through 23 June 2023
ER -