A multi-agent off-policy actor-critic algorithm for distributed reinforcement learning

Wesley Suttle, Zhuoran Yang, Kaiqing Zhang, Zhaoran Wang, Tamer Basar, Ji Liu

Research output: Contribution to journalConference articlepeer-review


This paper extends off-policy reinforcement learning to the multi-agent case in which a set of networked agents communicating with their neighbors according to a time-varying graph collaboratively evaluates and improves a target policy while following a distinct behavior policy. To this end, the paper develops a multi-agent version of emphatic temporal difference learning for off-policy policy evaluation, and proves convergence under linear function approximation. The paper then leverages this result, in conjunction with a novel multi-agent off-policy policy gradient theorem and recent work in both multi-agent on-policy and single-agent off-policy actor-critic methods, to develop and give convergence guarantees for a new multi-agent off-policy actor-critic algorithm. An empirical validation of these theoretical results is given.

Original languageEnglish (US)
Pages (from-to)1549-1554
Number of pages6
StatePublished - 2020
Externally publishedYes
Event21st IFAC World Congress 2020 - Berlin, Germany
Duration: Jul 12 2020Jul 17 2020


  • Adaptive control of multi-agent systems
  • Consensus and reinforcement learning control

ASJC Scopus subject areas

  • Control and Systems Engineering


Dive into the research topics of 'A multi-agent off-policy actor-critic algorithm for distributed reinforcement learning'. Together they form a unique fingerprint.

Cite this