TY - GEN
T1 - Networked Multi-Agent Reinforcement Learning in Continuous Spaces
AU - Zhang, Kaiqing
AU - Yang, Zhuoran
AU - Basar, Tamer
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/7/2
Y1 - 2018/7/2
N2 - Many real-world tasks on practical control systems involve the learning and decision-making of multiple agents, under limited communications and observations. In this paper, we study the problem of networked multi-agent reinforcement learning (MARL), where multiple agents perform reinforcement learning in a common environment, and are able to exchange information via a possibly time-varying communication network. In particular, we focus on a collaborative MARL setting where each agent has individual reward functions, and the objective of all the agents is to maximize the network-wide averaged long-term return. To this end, we propose a fully decentralized actor-critic algorithm that only relies on neighbor-to-neighbor communications among agents. To promote the use of the algorithm on practical control systems, we focus on the setting with continuous state and action spaces, and adopt the newly proposed expected policy gradient to reduce the variance of the gradient estimate. We provide convergence guarantees for the algorithm when linear function approximation is employed, and corroborate our theoretical results via simulations.
AB - Many real-world tasks on practical control systems involve the learning and decision-making of multiple agents, under limited communications and observations. In this paper, we study the problem of networked multi-agent reinforcement learning (MARL), where multiple agents perform reinforcement learning in a common environment, and are able to exchange information via a possibly time-varying communication network. In particular, we focus on a collaborative MARL setting where each agent has individual reward functions, and the objective of all the agents is to maximize the network-wide averaged long-term return. To this end, we propose a fully decentralized actor-critic algorithm that only relies on neighbor-to-neighbor communications among agents. To promote the use of the algorithm on practical control systems, we focus on the setting with continuous state and action spaces, and adopt the newly proposed expected policy gradient to reduce the variance of the gradient estimate. We provide convergence guarantees for the algorithm when linear function approximation is employed, and corroborate our theoretical results via simulations.
UR - http://www.scopus.com/inward/record.url?scp=85062195297&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85062195297&partnerID=8YFLogxK
U2 - 10.1109/CDC.2018.8619581
DO - 10.1109/CDC.2018.8619581
M3 - Conference contribution
AN - SCOPUS:85062195297
T3 - Proceedings of the IEEE Conference on Decision and Control
SP - 2771
EP - 2776
BT - 2018 IEEE Conference on Decision and Control, CDC 2018
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 57th IEEE Conference on Decision and Control, CDC 2018
Y2 - 17 December 2018 through 19 December 2018
ER -