TY - GEN
T1 - Information State Embedding in Partially Observable Cooperative Multi-Agent Reinforcement Learning
AU - Mao, Weichao
AU - Zhang, Kaiqing
AU - Miehling, Erik
AU - Basar, Tamer
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/12/14
Y1 - 2020/12/14
N2 - Multi-agent reinforcement learning (MARL) under partial observability has long been considered challenging, primarily due to the requirement for each agent to maintain a belief over all other agents' local histories - a domain that generally grows exponentially over time. In this work, we investigate a partially observable MARL problem in which agents are cooperative. To enable the development of tractable algorithms, we introduce the concept of an information state embedding that serves to compress agents' histories. We quantify how the compression error influences the resulting value functions for decentralized control. Furthermore, we propose an instance of the embedding based on recurrent neural networks (RNNs). The embedding is then used as an approximate information state, and can be fed into any MARL algorithm. The proposed embed-then-learn pipeline opens the black-box of existing (partially observable) MARL algorithms, allowing us to establish some theoretical guarantees (error bounds of value functions) while still achieving competitive performance with many end-to-end approaches.
AB - Multi-agent reinforcement learning (MARL) under partial observability has long been considered challenging, primarily due to the requirement for each agent to maintain a belief over all other agents' local histories - a domain that generally grows exponentially over time. In this work, we investigate a partially observable MARL problem in which agents are cooperative. To enable the development of tractable algorithms, we introduce the concept of an information state embedding that serves to compress agents' histories. We quantify how the compression error influences the resulting value functions for decentralized control. Furthermore, we propose an instance of the embedding based on recurrent neural networks (RNNs). The embedding is then used as an approximate information state, and can be fed into any MARL algorithm. The proposed embed-then-learn pipeline opens the black-box of existing (partially observable) MARL algorithms, allowing us to establish some theoretical guarantees (error bounds of value functions) while still achieving competitive performance with many end-to-end approaches.
UR - http://www.scopus.com/inward/record.url?scp=85099876096&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85099876096&partnerID=8YFLogxK
U2 - 10.1109/CDC42340.2020.9303801
DO - 10.1109/CDC42340.2020.9303801
M3 - Conference contribution
AN - SCOPUS:85099876096
T3 - Proceedings of the IEEE Conference on Decision and Control
SP - 6124
EP - 6131
BT - 2020 59th IEEE Conference on Decision and Control, CDC 2020
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 59th IEEE Conference on Decision and Control, CDC 2020
Y2 - 14 December 2020 through 18 December 2020
ER -