TY - JOUR
T1 - On Improving Model-Free Algorithms for Decentralized Multi-Agent Reinforcement Learning
AU - Mao, Weichao
AU - Yang, Lin F.
AU - Zhang, Kaiqing
AU - Başar, Tamer
N1 - We thank Zihan Zhang and Chen-Yu Wei for helpful discussions and feedback. Research of W.M. and T.B. was supported in part by the ONR MURI Grant N00014-16-1-2710 and in part by the IBM-Illinois Discovery Accelerator Institute. Research of L.Y. was supported in part by DARPA grant HR00112190130. K.Z. was supported by the Simons-Berkeley Research Fellowship.
PY - 2022
Y1 - 2022
N2 - Multi-agent reinforcement learning (MARL) algorithms often suffer from an exponential sample complexity dependence on the number of agents, a phenomenon known as the curse of multiagents. We address this challenge by investigating sample-efficient model-free algorithms in decentralized MARL, and aim to improve existing algorithms along this line. For learning (coarse) correlated equilibria in general-sum Markov games, we propose stage-based V-learning algorithms that significantly simplify the algorithmic design and analysis of recent works, and circumvent a rather complicated no-weighted-regret bandit subroutine. For learning Nash equilibria in Markov potential games, we propose an independent policy gradient algorithm with a decentralized momentum-based variance reduction technique. All our algorithms are decentralized in that each agent can make decisions based on only its local information. Neither communication nor centralized coordination is required during learning, leading to a natural generalization to a large number of agents. Finally, we provide numerical simulations to corroborate our theoretical findings.
AB - Multi-agent reinforcement learning (MARL) algorithms often suffer from an exponential sample complexity dependence on the number of agents, a phenomenon known as the curse of multiagents. We address this challenge by investigating sample-efficient model-free algorithms in decentralized MARL, and aim to improve existing algorithms along this line. For learning (coarse) correlated equilibria in general-sum Markov games, we propose stage-based V-learning algorithms that significantly simplify the algorithmic design and analysis of recent works, and circumvent a rather complicated no-weighted-regret bandit subroutine. For learning Nash equilibria in Markov potential games, we propose an independent policy gradient algorithm with a decentralized momentum-based variance reduction technique. All our algorithms are decentralized in that each agent can make decisions based on only its local information. Neither communication nor centralized coordination is required during learning, leading to a natural generalization to a large number of agents. Finally, we provide numerical simulations to corroborate our theoretical findings.
UR - http://www.scopus.com/inward/record.url?scp=85163103006&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85163103006&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85163103006
SN - 2640-3498
VL - 162
SP - 15007
EP - 15049
JO - Proceedings of Machine Learning Research
JF - Proceedings of Machine Learning Research
T2 - 39th International Conference on Machine Learning, ICML 2022
Y2 - 17 July 2022 through 23 July 2022
ER -