Primal-Dual Algorithm for Distributed Reinforcement Learning: Distributed GTD

Donghwan Lee, Hyungjin Yoon, Naira Hovakimyan

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The goal of this paper is to study a distributed version of the gradient temporal-difference (GTD) learning algorithm for multi-agent Markov decision processes (MDPs). The temporal-difference (TD) learning is a reinforcement learning (RL) algorithm that learns an infinite horizon discounted cost function (or value function) for a given fixed policy without the model knowledge. In the distributed RL case each agent receives local reward through local processing. Information exchange over sparse communication network allows the agents to learn the global value function corresponding to a global reward, which is a sum of local rewards. In this paper, the problem is converted into a constrained convex optimization problem with a consensus constraint. We then propose a primal-dual distributed GTD algorithm and prove that it almost surely converges to a set of stationary points of the optimization problem.

Original languageEnglish (US)
Title of host publication2018 IEEE Conference on Decision and Control, CDC 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1967-1972
Number of pages6
ISBN (Electronic)9781538613955
DOIs
StatePublished - Jul 2 2018
Event57th IEEE Conference on Decision and Control, CDC 2018 - Miami, United States
Duration: Dec 17 2018Dec 19 2018

Publication series

NameProceedings of the IEEE Conference on Decision and Control
Volume2018-December
ISSN (Print)0743-1546
ISSN (Electronic)2576-2370

Conference

Conference57th IEEE Conference on Decision and Control, CDC 2018
Country/TerritoryUnited States
CityMiami
Period12/17/1812/19/18

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Modeling and Simulation
  • Control and Optimization

Fingerprint

Dive into the research topics of 'Primal-Dual Algorithm for Distributed Reinforcement Learning: Distributed GTD'. Together they form a unique fingerprint.

Cite this