TY - GEN
T1 - MIRAS
T2 - 39th IEEE International Conference on Distributed Computing Systems, ICDCS 2019
AU - Yang, Zhe
AU - Nguyen, Phuong
AU - Jin, Haiming
AU - Nahrstedt, Klara
N1 - Funding Information:
Apart from task scheduling or resource management, reinforcement learning, especially deep deinforcement learning (DRL) [24], has been adopted gradually in building computer or networking systems to fully take advantage of its outstanding performance in decision making. In [35], Xu et al. propose a DRL-based approach model-free control framework that tackles traffic engineering. In [23], the authors implements a system which uses DRL to select appropriate video bitrates to improve quality of service. Li et al. implement an Apache Storm based DRL system for scheduling in distributed stream data processing systems [18]. VIII. CONCLUSION Resource allocation for microservice workflow systems is difficult due to dynamic workloads and complicated interactions between microservices in each workflow. In this paper, we propose to make resource allocation decisions based on deep reinforcement learning control policy. To tackle the high sample complexity problem of reinforcement learning, we propose MIRAS, a model-based approach. Specifically, we let a DDPG agent interact with the real microservice workflow system and collect interacting profiling data. We use the collected data to train a predictive model of the environment and use this model to further train the DDPG agent. We iterate among these three phases until the learnt agent is able to make good resource allocation decisions in the microservice workflow system. Our evaluations confirm the performances of the learnt policy. ACKNOWLEDGEMENT This work is supported by the National Science Foundation under grant NSF 1827126.
Publisher Copyright:
© 2019 IEEE.
PY - 2019/7
Y1 - 2019/7
N2 - Microservice, an architectural design that decomposes applications into loosely coupled services, is adopted in modern software design, including cloud-based scientific workflow processing. The microservice design makes scientific workflow systems more modular, more flexible, and easier to develop. However, cloud deployment of microservice workflow execution systems doesn't come for free, and proper resource management decisions have to be made in order to achieve certain performance objective (e.g., response time) within constraint operation cost. Nevertheless, effective online resource allocation decisions are hard to achieve due to dynamic workloads and the complicated interactions of microservices in each workflow. In this paper, we propose an adaptive resource allocation approach for microservice workflow system based on recent advances in reinforcement learning. Our approach (1) assumes little prior knowledge of the microservice workflow system and does not require any elaborately designed model or crafted representative simulator of the underlying system, and (2) avoids high sample complexity which is a common drawback of model-free reinforcement learning when applied to real-world scenarios. We show that our proposed approach automatically achieves effective policy for resource allocation with limited number of time-consuming interactions with the microservice workflow system. We perform extensive evaluations to validate the effectiveness of our approach and demonstrate that it outperforms existing resource allocation approaches with read-world emulated workflows.
AB - Microservice, an architectural design that decomposes applications into loosely coupled services, is adopted in modern software design, including cloud-based scientific workflow processing. The microservice design makes scientific workflow systems more modular, more flexible, and easier to develop. However, cloud deployment of microservice workflow execution systems doesn't come for free, and proper resource management decisions have to be made in order to achieve certain performance objective (e.g., response time) within constraint operation cost. Nevertheless, effective online resource allocation decisions are hard to achieve due to dynamic workloads and the complicated interactions of microservices in each workflow. In this paper, we propose an adaptive resource allocation approach for microservice workflow system based on recent advances in reinforcement learning. Our approach (1) assumes little prior knowledge of the microservice workflow system and does not require any elaborately designed model or crafted representative simulator of the underlying system, and (2) avoids high sample complexity which is a common drawback of model-free reinforcement learning when applied to real-world scenarios. We show that our proposed approach automatically achieves effective policy for resource allocation with limited number of time-consuming interactions with the microservice workflow system. We perform extensive evaluations to validate the effectiveness of our approach and demonstrate that it outperforms existing resource allocation approaches with read-world emulated workflows.
KW - Microservice
KW - Reinforcement learning
KW - Resource allocation
UR - http://www.scopus.com/inward/record.url?scp=85074863396&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85074863396&partnerID=8YFLogxK
U2 - 10.1109/ICDCS.2019.00021
DO - 10.1109/ICDCS.2019.00021
M3 - Conference contribution
AN - SCOPUS:85074863396
T3 - Proceedings - International Conference on Distributed Computing Systems
SP - 122
EP - 132
BT - Proceedings - 2019 39th IEEE International Conference on Distributed Computing Systems, ICDCS 2019
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 7 July 2019 through 9 July 2019
ER -