Reinforcement learning for resource management in multi-Tenant serverless platforms

Haoran Qiu, Weichao Mao, Archit Patke, Chen Wang, Hubertus Franke, Zbigniew T. Kalbarczyk, Tamer Başar, Ravishankar K. Iyer

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Serverless Function-As-A-Service (FaaS) is an emerging cloud computing paradigm that frees application developers from infrastructure management tasks such as resource provisioning and scaling. To reduce the tail latency of functions and improve resource utilization, recent research has been focused on applying online learning algorithms such as reinforcement learning (RL) to manage resources. Compared to existing heuristics-based resource management approaches, RL-based approaches eliminate humans in the loop and avoid the painstaking generation of heuristics. In this paper, we show that the state-of-The-Art single-Agent RL algorithm (S-RL) suffers up to 4.6x higher function tail latency degradation on multi-Tenant serverless FaaS platforms and is unable to converge during training. We then propose and implement a customized multi-Agent RL algorithm based on Proximal Policy Optimization, i.e., multi-Agent PPO (MA-PPO). We show that in multi-Tenant environments, MA-PPO enables each agent to be trained until convergence and provides online performance comparable to S-RL in single-Tenant cases with less than 10% degradation. Besides, MA-PPO provides a 4.4x improvement in S-RL performance (in terms of function tail latency) in multi-Tenant cases.

Original languageEnglish (US)
Title of host publicationEuroMLSys 2022 - Proceedings of the 2nd European Workshop on Machine Learning and Systems
PublisherAssociation for Computing Machinery
Pages20-28
Number of pages9
ISBN (Electronic)9781450392549
DOIs
StatePublished - Apr 5 2022
Event2nd European Workshop on Machine Learning and Systems, EuroMLSys 2022, in conjunction with ACM EuroSys 2022 - Virtual, Online, France
Duration: Apr 5 2022Apr 8 2022

Publication series

NameEuroMLSys 2022 - Proceedings of the 2nd European Workshop on Machine Learning and Systems

Conference

Conference2nd European Workshop on Machine Learning and Systems, EuroMLSys 2022, in conjunction with ACM EuroSys 2022
Country/TerritoryFrance
CityVirtual, Online
Period4/5/224/8/22

Keywords

  • function-As-A-service
  • multi-Agent
  • reinforcement learning
  • resource allocation
  • serverless computing

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Computer Science Applications
  • Hardware and Architecture
  • Human-Computer Interaction
  • Information Systems
  • Software
  • Control and Systems Engineering

Fingerprint

Dive into the research topics of 'Reinforcement learning for resource management in multi-Tenant serverless platforms'. Together they form a unique fingerprint.

Cite this