SIMPPO: A Scalable and Incremental Online Learning Framework for Serverless Resource Management

Haoran Qiu, Weichao Mao, Archit Patke, Chen Wang, Hubertus Franke, Zbigniew T. Kalbarczyk, Tamer Başar, Ravishankar K. Iyer

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Serverless Function-as-a-Service (FaaS) offers improved programmability for customers, yet it is not server-"less"and comes at the cost of more complex infrastructure management (e.g., resource provisioning and scheduling) for cloud providers. To maintain service-level objectives (SLOs) and improve resource utilization efficiency, recent research has been focused on applying online learning algorithms such as reinforcement learning (RL) to manage resources. Despite the initial success of applying RL, we first show in this paper that the state-of-the-art single-agent RL algorithm (S-RL) suffers up to 4.8x higher p99 function latency degradation on multi-tenant serverless FaaS platforms compared to isolated environments and is unable to converge during training. We then design and implement a scalable and incremental multi-agent RL framework based on Proximal Policy Optimization (SIMPPO). Our experiments demonstrate that in multi-tenant environments, SIMPPO enables each RL agent to efficiently converge during training and provides online function latency performance comparable to that of S-RL trained in isolation with minor degradation (<9.2%). In addition, SIMPPO reduces the p99 function latency by 4.5x compared to S-RL in multi-tenant cases.

Original languageEnglish (US)
Title of host publicationSoCC 2022 - Proceedings of the 13th Symposium on Cloud Computing
PublisherAssociation for Computing Machinery
Pages306-322
Number of pages17
ISBN (Electronic)9781450394147
DOIs
StatePublished - Nov 7 2022
Event13th Annual ACM Symposium on Cloud Computing, SoCC 2022 - San Francisco, United States
Duration: Nov 7 2022Nov 11 2022

Publication series

NameSoCC 2022 - Proceedings of the 13th Symposium on Cloud Computing

Conference

Conference13th Annual ACM Symposium on Cloud Computing, SoCC 2022
Country/TerritoryUnited States
CitySan Francisco
Period11/7/2211/11/22

Keywords

  • multi-agent
  • reinforcement learning
  • serverless computing

ASJC Scopus subject areas

  • Artificial Intelligence
  • Information Systems
  • Software
  • Computational Theory and Mathematics
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'SIMPPO: A Scalable and Incremental Online Learning Framework for Serverless Resource Management'. Together they form a unique fingerprint.

Cite this