TY - JOUR
T1 - Finite-Sample Analysis for Decentralized Batch Multiagent Reinforcement Learning with Networked Agents
AU - Zhang, Kaiqing
AU - Yang, Zhuoran
AU - Liu, Han
AU - Zhang, Tong
AU - Basar, Tamer
N1 - Funding Information:
This work was supported in part by the US Army Research Laboratory (ARL) Cooperative Agreement W911NF-17-2-0196, and in part by the Air Force Office of Scientific Research (AFOSR) Grant FA9550-19-1-0353
Publisher Copyright:
© 1963-2012 IEEE.
PY - 2021/12/1
Y1 - 2021/12/1
N2 - Despite the increasing interest in multiagent reinforcement learning (MARL) in multiple communities, understanding its theoretical foundation has long been recognized as a challenging problem. In this article, we address this problem by providing a finite-sample analysis for decentralized batch MARL. Specifically, we consider a type of mixed MARL setting with both cooperative and competitive agents, where two teams of agents compete in a zero-sum game setting, while the agents within each team collaborate by communicating over a time-varying network. This setting covers many conventional MARL settings in the literature. We then develop batch MARL algorithms that can be implemented in a decentralized fashion, and quantify the finite-sample errors of the estimated action-value functions. Our error analysis captures how the function class, the number of samples within each iteration, and the number of iterations determine the statistical accuracy of the proposed algorithms. Our results, compared to the finite-sample bounds for single-agent reinforcement learning, involve additional error terms caused by decentralized computation, which is inherent in our decentralized MARL setting. This article provides the first finite-sample analysis for batch MARL, a step toward rigorous theoretical understanding of general MARL algorithms in the finite-sample regime.
AB - Despite the increasing interest in multiagent reinforcement learning (MARL) in multiple communities, understanding its theoretical foundation has long been recognized as a challenging problem. In this article, we address this problem by providing a finite-sample analysis for decentralized batch MARL. Specifically, we consider a type of mixed MARL setting with both cooperative and competitive agents, where two teams of agents compete in a zero-sum game setting, while the agents within each team collaborate by communicating over a time-varying network. This setting covers many conventional MARL settings in the literature. We then develop batch MARL algorithms that can be implemented in a decentralized fashion, and quantify the finite-sample errors of the estimated action-value functions. Our error analysis captures how the function class, the number of samples within each iteration, and the number of iterations determine the statistical accuracy of the proposed algorithms. Our results, compared to the finite-sample bounds for single-agent reinforcement learning, involve additional error terms caused by decentralized computation, which is inherent in our decentralized MARL setting. This article provides the first finite-sample analysis for batch MARL, a step toward rigorous theoretical understanding of general MARL algorithms in the finite-sample regime.
KW - Approximation algorithms
KW - Function approximation
KW - Game theory
KW - Games
KW - Heuristic algorithms
KW - Markov processes
KW - Reinforcement learning
KW - Multiagent systems
KW - Networked control systems
KW - Statistical learning
KW - Machine learning
UR - http://www.scopus.com/inward/record.url?scp=85099416089&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85099416089&partnerID=8YFLogxK
U2 - 10.1109/TAC.2021.3049345
DO - 10.1109/TAC.2021.3049345
M3 - Article
AN - SCOPUS:85099416089
SN - 0018-9286
VL - 66
SP - 5925
EP - 5940
JO - IEEE Transactions on Automatic Control
JF - IEEE Transactions on Automatic Control
IS - 12
ER -