TY - GEN
T1 - When Green Computing Meets Performance and Resilience SLOs
AU - Qiu, Haoran
AU - Mao, Weichao
AU - Wang, Chen
AU - Jha, Saurabh
AU - Franke, Hubertus
AU - Narayanaswami, Chandra
AU - Kalbarczyk, Zbigniew
AU - Basar, Tamer
AU - Iyer, Ravishankar
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - This paper addresses the urgent need to transition to global net-zero carbon emissions by 2050 while retaining the ability to meet joint performance and resilience objectives. The focus is on the computing infrastructures, such as hyper-scale cloud datacenters, that consume significant power, thus producing increasing amounts of carbon emissions. Our goal is to (1) optimize the usage of green energy sources (e.g., solar energy), which is desirable but expensive and relatively unstable, and (2) continuously reduce the use of fossil fuels, which have a lower cost but a significant negative societal impact. Meanwhile, cloud datacenters strive to meet their customers' requirements, e.g., service-level objectives (SLOs) in application latency or throughput, which are impacted by infrastructure resilience and availability. We propose a scalable formulation that combines sustainability, cloud resilience, and performance as a joint optimization problem with multiple interdependent objectives to address these issues holistically. Given the complexity and dynamicity of the problem, machine learning (ML) approaches, such as reinforcement learning, are essential for achieving continuous optimization. Our study highlights the challenges of green energy instability which necessitates innovative ML-centric solutions across heterogeneous infrastructures to manage the transition towards green computing. Underlying the ML-centric solutions must be methods to combine classic system resilience techniques with innovations in real-time ML resilience (not addressed heretofore). We believe that this approach will not only set a new direction in the resilient, SLO-driven adoption of green energy but also enable us to manage future sustainable systems in ways that were not possible before.
AB - This paper addresses the urgent need to transition to global net-zero carbon emissions by 2050 while retaining the ability to meet joint performance and resilience objectives. The focus is on the computing infrastructures, such as hyper-scale cloud datacenters, that consume significant power, thus producing increasing amounts of carbon emissions. Our goal is to (1) optimize the usage of green energy sources (e.g., solar energy), which is desirable but expensive and relatively unstable, and (2) continuously reduce the use of fossil fuels, which have a lower cost but a significant negative societal impact. Meanwhile, cloud datacenters strive to meet their customers' requirements, e.g., service-level objectives (SLOs) in application latency or throughput, which are impacted by infrastructure resilience and availability. We propose a scalable formulation that combines sustainability, cloud resilience, and performance as a joint optimization problem with multiple interdependent objectives to address these issues holistically. Given the complexity and dynamicity of the problem, machine learning (ML) approaches, such as reinforcement learning, are essential for achieving continuous optimization. Our study highlights the challenges of green energy instability which necessitates innovative ML-centric solutions across heterogeneous infrastructures to manage the transition towards green computing. Underlying the ML-centric solutions must be methods to combine classic system resilience techniques with innovations in real-time ML resilience (not addressed heretofore). We believe that this approach will not only set a new direction in the resilient, SLO-driven adoption of green energy but also enable us to manage future sustainable systems in ways that were not possible before.
KW - cloud computing
KW - green energy
KW - machine learning
KW - machine learning resilience
KW - resilience
KW - sustainability
UR - http://www.scopus.com/inward/record.url?scp=85203825669&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85203825669&partnerID=8YFLogxK
U2 - 10.1109/DSN-S60304.2024.00015
DO - 10.1109/DSN-S60304.2024.00015
M3 - Conference contribution
AN - SCOPUS:85203825669
T3 - Proceedings - 2024 54th Annual IEEE/IFIP International Conference on Dependable Systems and Networks - Supplemental Volume, DSN-S 2024
SP - 17
EP - 22
BT - Proceedings - 2024 54th Annual IEEE/IFIP International Conference on Dependable Systems and Networks - Supplemental Volume, DSN-S 2024
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 54th Annual IEEE/IFIP International Conference on Dependable Systems and Networks - Supplemental Volume, DSN-S 2024
Y2 - 24 June 2024 through 27 June 2024
ER -