When Green Computing Meets Performance and Resilience SLOs

Haoran Qiu, Weichao Mao, Chen Wang, Saurabh Jha, Hubertus Franke, Chandra Narayanaswami, Zbigniew Kalbarczyk, Tamer Basar, Ravishankar Iyer

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper addresses the urgent need to transition to global net-zero carbon emissions by 2050 while retaining the ability to meet joint performance and resilience objectives. The focus is on the computing infrastructures, such as hyper-scale cloud datacenters, that consume significant power, thus producing increasing amounts of carbon emissions. Our goal is to (1) optimize the usage of green energy sources (e.g., solar energy), which is desirable but expensive and relatively unstable, and (2) continuously reduce the use of fossil fuels, which have a lower cost but a significant negative societal impact. Meanwhile, cloud datacenters strive to meet their customers' requirements, e.g., service-level objectives (SLOs) in application latency or throughput, which are impacted by infrastructure resilience and availability. We propose a scalable formulation that combines sustainability, cloud resilience, and performance as a joint optimization problem with multiple interdependent objectives to address these issues holistically. Given the complexity and dynamicity of the problem, machine learning (ML) approaches, such as reinforcement learning, are essential for achieving continuous optimization. Our study highlights the challenges of green energy instability which necessitates innovative ML-centric solutions across heterogeneous infrastructures to manage the transition towards green computing. Underlying the ML-centric solutions must be methods to combine classic system resilience techniques with innovations in real-time ML resilience (not addressed heretofore). We believe that this approach will not only set a new direction in the resilient, SLO-driven adoption of green energy but also enable us to manage future sustainable systems in ways that were not possible before.

Original languageEnglish (US)
Title of host publicationProceedings - 2024 54th Annual IEEE/IFIP International Conference on Dependable Systems and Networks - Supplemental Volume, DSN-S 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages17-22
Number of pages6
ISBN (Electronic)9798350395709
DOIs
StatePublished - 2024
Externally publishedYes
Event54th Annual IEEE/IFIP International Conference on Dependable Systems and Networks - Supplemental Volume, DSN-S 2024 - Brisbane, Australia
Duration: Jun 24 2024Jun 27 2024

Publication series

NameProceedings - 2024 54th Annual IEEE/IFIP International Conference on Dependable Systems and Networks - Supplemental Volume, DSN-S 2024

Conference

Conference54th Annual IEEE/IFIP International Conference on Dependable Systems and Networks - Supplemental Volume, DSN-S 2024
Country/TerritoryAustralia
CityBrisbane
Period6/24/246/27/24

Keywords

  • cloud computing
  • green energy
  • machine learning
  • machine learning resilience
  • resilience
  • sustainability

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Networks and Communications
  • Information Systems
  • Software
  • Safety, Risk, Reliability and Quality

Fingerprint

Dive into the research topics of 'When Green Computing Meets Performance and Resilience SLOs'. Together they form a unique fingerprint.

Cite this