TY - JOUR
T1 - Reinforcement Learning for Non-stationary Discrete-Time Linear–Quadratic Mean-Field Games in Multiple Populations
AU - uz Zaman, Muhammad Aneeq
AU - Miehling, Erik
AU - Başar, Tamer
N1 - Funding Information:
Research leading to this work was supported in part by AFOSR Grant FA9550-19-1-0353. This article is part of the topical collection “Multi-agent Dynamic Decision Making and Learning” edited by Konstantin Avrachenkov, Vivek S. Borkar and U. Jayakrishnan Nair.
Publisher Copyright:
© 2022, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
PY - 2023/3
Y1 - 2023/3
N2 - Scalability of reinforcement learning algorithms to multi-agent systems is a significant bottleneck to their practical use. In this paper, we approach multi-agent reinforcement learning from a mean-field game perspective, where the number of agents tends to infinity. Our analysis focuses on the structured setting of systems with linear dynamics and quadratic costs, named linear–quadratic mean-field games, evolving over a discrete-time infinite horizon where agents are assumed to be partitioned into finitely many populations connected by a network of known structure. The functional forms of the agents’ costs and dynamics are assumed to be the same within populations, but differ between populations. We first characterize the equilibrium of the mean-field game which further prescribes an ϵ-Nash equilibrium for the finite population game. Our main focus is on the design of a learning algorithm, based on zero-order stochastic optimization, for computing mean-field equilibria. The algorithm exploits the affine structure of both the equilibrium controller and equilibrium mean-field trajectory by decomposing the learning task into first learning the linear terms and then learning the affine terms. We present a convergence proof and a finite-sample bound quantifying the estimation error as a function of the number of samples.
AB - Scalability of reinforcement learning algorithms to multi-agent systems is a significant bottleneck to their practical use. In this paper, we approach multi-agent reinforcement learning from a mean-field game perspective, where the number of agents tends to infinity. Our analysis focuses on the structured setting of systems with linear dynamics and quadratic costs, named linear–quadratic mean-field games, evolving over a discrete-time infinite horizon where agents are assumed to be partitioned into finitely many populations connected by a network of known structure. The functional forms of the agents’ costs and dynamics are assumed to be the same within populations, but differ between populations. We first characterize the equilibrium of the mean-field game which further prescribes an ϵ-Nash equilibrium for the finite population game. Our main focus is on the design of a learning algorithm, based on zero-order stochastic optimization, for computing mean-field equilibria. The algorithm exploits the affine structure of both the equilibrium controller and equilibrium mean-field trajectory by decomposing the learning task into first learning the linear terms and then learning the affine terms. We present a convergence proof and a finite-sample bound quantifying the estimation error as a function of the number of samples.
KW - Large population games on networks
KW - Mean-field games
KW - Multi-agent reinforcement learning
KW - Zero-order stochastic optimization
UR - http://www.scopus.com/inward/record.url?scp=85129708166&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85129708166&partnerID=8YFLogxK
U2 - 10.1007/s13235-022-00448-w
DO - 10.1007/s13235-022-00448-w
M3 - Article
AN - SCOPUS:85129708166
SN - 2153-0785
VL - 13
SP - 118
EP - 164
JO - Dynamic Games and Applications
JF - Dynamic Games and Applications
IS - 1
ER -