TY - JOUR

T1 - Robust multi-agent reinforcement learning with model uncertainty

AU - Zhang, Kaiqing

AU - Sun, Tao

AU - Tao, Yunzhe

AU - Genc, Sahika

AU - Mallya, Sunil

AU - Basar, Tamer

N1 - Publisher Copyright:
© 2020 Neural information processing systems foundation. All rights reserved.

PY - 2020

Y1 - 2020

N2 - In this work, we study the problem of multi-agent reinforcement learning (MARL) with model uncertainty, which is referred to as robust MARL. This is naturally motivated by some multi-agent applications where each agent may not have perfectly accurate knowledge of the model, e.g., all the reward functions of other agents. Little a priori work on MARL has accounted for such uncertainties, neither in problem formulation nor in algorithm design. In contrast, we model the problem as a robust Markov game, where the goal of all agents is to find policies such that no agent has the incentive to deviate, i.e., reach some equilibrium point, which is also robust to the possible uncertainty of the MARL model. We first introduce the solution concept of robust Nash equilibrium in our setting, and develop a Q-learning algorithm to find such equilibrium policies, with convergence guarantees under certain conditions. In order to handle possibly enormous state-action spaces in practice, we then derive the policy gradients for robust MARL, and develop an actor-critic algorithm with function approximation. Our experiments demonstrate that the proposed algorithm outperforms several baseline MARL methods that do not account for the model uncertainty, in several standard but uncertain cooperative and competitive MARL environments.

AB - In this work, we study the problem of multi-agent reinforcement learning (MARL) with model uncertainty, which is referred to as robust MARL. This is naturally motivated by some multi-agent applications where each agent may not have perfectly accurate knowledge of the model, e.g., all the reward functions of other agents. Little a priori work on MARL has accounted for such uncertainties, neither in problem formulation nor in algorithm design. In contrast, we model the problem as a robust Markov game, where the goal of all agents is to find policies such that no agent has the incentive to deviate, i.e., reach some equilibrium point, which is also robust to the possible uncertainty of the MARL model. We first introduce the solution concept of robust Nash equilibrium in our setting, and develop a Q-learning algorithm to find such equilibrium policies, with convergence guarantees under certain conditions. In order to handle possibly enormous state-action spaces in practice, we then derive the policy gradients for robust MARL, and develop an actor-critic algorithm with function approximation. Our experiments demonstrate that the proposed algorithm outperforms several baseline MARL methods that do not account for the model uncertainty, in several standard but uncertain cooperative and competitive MARL environments.

UR - http://www.scopus.com/inward/record.url?scp=85104918543&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85104918543&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:85104918543

SN - 1049-5258

VL - 2020-December

JO - Advances in Neural Information Processing Systems

JF - Advances in Neural Information Processing Systems

T2 - 34th Conference on Neural Information Processing Systems, NeurIPS 2020

Y2 - 6 December 2020 through 12 December 2020

ER -