TY - JOUR
T1 - Finite-sample analysis for decentralized cooperative multi-agent reinforcement learning from batch data
AU - Zhang, Kaiqing
AU - Yang, Zhuoran
AU - Liu, Han
AU - Zhang, Tong
AU - Basar, Tamer
N1 - Funding Information:
K. Zhang and T. Basar are supported in part by the US Army Research Laboratory (ARL) Cooperative Agreement W911NF-17-2-0196, and in part by the Office of Naval Research (ONR) MURI Grant N00014-16-1-2710. Z. Yang is supported by Tencent PhD Fellowship. Due to space limitation, some definitions and proof details are deferred to the accompanying complete report (Zhang et al., 2018f).
Publisher Copyright:
Copyright © 2020 The Authors. This is an open access article under the CC BY-NC-ND license
PY - 2020
Y1 - 2020
N2 - In contrast to its great empirical success, theoretical understanding of multi-agent reinforcement learning (MARL) remains largely underdeveloped. As an initial attempt, we provide a finite-sample analysis for decentralized cooperative MARL with networked agents. In particular, we consider a team of cooperative agents connected by a time-varying communication network, with no central controller coordinating them. The goal for each agent is to maximize the long-term return associated with the team-average reward, by communicating only with its neighbors over the network. A batch MARL algorithm is developed for this setting, which can be implemented in a decentralized fashion. We then quantify the estimation errors of the action-value functions obtained from our algorithm, establishing their dependence on the function class, the number of samples in each iteration, and the number of iterations. This work appears to be the first finite-sample analysis for decentralized cooperative MARL from batch data.
AB - In contrast to its great empirical success, theoretical understanding of multi-agent reinforcement learning (MARL) remains largely underdeveloped. As an initial attempt, we provide a finite-sample analysis for decentralized cooperative MARL with networked agents. In particular, we consider a team of cooperative agents connected by a time-varying communication network, with no central controller coordinating them. The goal for each agent is to maximize the long-term return associated with the team-average reward, by communicating only with its neighbors over the network. A batch MARL algorithm is developed for this setting, which can be implemented in a decentralized fashion. We then quantify the estimation errors of the action-value functions obtained from our algorithm, establishing their dependence on the function class, the number of samples in each iteration, and the number of iterations. This work appears to be the first finite-sample analysis for decentralized cooperative MARL from batch data.
KW - Decentralized optimization
KW - Finite-Sample Analysis
KW - Multi-Agent Systems
KW - Networked systems
KW - Reinforcement learning
UR - http://www.scopus.com/inward/record.url?scp=85105078079&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85105078079&partnerID=8YFLogxK
U2 - 10.1016/j.ifacol.2020.12.1290
DO - 10.1016/j.ifacol.2020.12.1290
M3 - Conference article
AN - SCOPUS:85105078079
VL - 53
SP - 1049
EP - 1056
JO - IFAC-PapersOnLine
JF - IFAC-PapersOnLine
SN - 2405-8963
IS - 2
T2 - 21st IFAC World Congress 2020
Y2 - 12 July 2020 through 17 July 2020
ER -