Decentralized Learning of Finite-Memory Policies in Dec-POMDPs

Weichao Mao, Kaiqing Zhang, Zhuoran Yang, Tamer Basar

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Multi-agent reinforcement learning (MARL) under partial observability is notoriously challenging as the agents only have asymmetric partial observations of the system. In this paper, we study MARL in decentralized partially observable Markov decision processes (Dec-POMDPs) with partial history sharing. In search of decentralized and tractable MARL solutions, we identify the appropriate conditions under which we can adopt the common information approach to naturally extend existing single-agent policy learners to Dec-POMDPs. In particular, under the conditions of bounded local memories and an efficient representation of the common information, we present a MARL algorithm that learns a near-optimal finite-memory policy in Dec-POMDPs. We establish the iteration complexity of the algorithm, which depends only linearly on the number of agents. Simulations on classic Dec-POMDP tasks show that our approach significantly outperforms existing decentralized solutions, and nearly matches the centralized ones that require stronger informational assumptions.

Original languageEnglish (US)
Title of host publicationIFAC-PapersOnLine
EditorsHideaki Ishii, Yoshio Ebihara, Jun-ichi Imura, Masaki Yamakita
PublisherElsevier B.V.
Pages2601-2607
Number of pages7
Edition2
ISBN (Electronic)9781713872344
DOIs
StatePublished - Jul 1 2023
Event22nd IFAC World Congress - Yokohama, Japan
Duration: Jul 9 2023Jul 14 2023

Publication series

NameIFAC-PapersOnLine
Number2
Volume56
ISSN (Electronic)2405-8963

Conference

Conference22nd IFAC World Congress
Country/TerritoryJapan
CityYokohama
Period7/9/237/14/23

Keywords

  • Multi-agent systems
  • decentralized control
  • decentralized optimization
  • machine learning
  • reinforcement learning

ASJC Scopus subject areas

  • Control and Systems Engineering

Fingerprint

Dive into the research topics of 'Decentralized Learning of Finite-Memory Policies in Dec-POMDPs'. Together they form a unique fingerprint.

Cite this