TY - GEN
T1 - Primal-Dual Algorithm for Distributed Reinforcement Learning
T2 - 57th IEEE Conference on Decision and Control, CDC 2018
AU - Lee, Donghwan
AU - Yoon, Hyungjin
AU - Hovakimyan, Naira
N1 - Funding Information:
This work has been supported in part by the National Science Foundation through the National Robotics Initiative grant number 1528036, EAGER grant number 1548409 and AFOSR grant number FA9550-15-1-0518.
PY - 2019/1/18
Y1 - 2019/1/18
N2 - The goal of this paper is to study a distributed version of the gradient temporal-difference (GTD) learning algorithm for multi-agent Markov decision processes (MDPs). The temporal-difference (TD) learning is a reinforcement learning (RL) algorithm that learns an infinite horizon discounted cost function (or value function) for a given fixed policy without the model knowledge. In the distributed RL case each agent receives local reward through local processing. Information exchange over sparse communication network allows the agents to learn the global value function corresponding to a global reward, which is a sum of local rewards. In this paper, the problem is converted into a constrained convex optimization problem with a consensus constraint. We then propose a primal-dual distributed GTD algorithm and prove that it almost surely converges to a set of stationary points of the optimization problem.
AB - The goal of this paper is to study a distributed version of the gradient temporal-difference (GTD) learning algorithm for multi-agent Markov decision processes (MDPs). The temporal-difference (TD) learning is a reinforcement learning (RL) algorithm that learns an infinite horizon discounted cost function (or value function) for a given fixed policy without the model knowledge. In the distributed RL case each agent receives local reward through local processing. Information exchange over sparse communication network allows the agents to learn the global value function corresponding to a global reward, which is a sum of local rewards. In this paper, the problem is converted into a constrained convex optimization problem with a consensus constraint. We then propose a primal-dual distributed GTD algorithm and prove that it almost surely converges to a set of stationary points of the optimization problem.
UR - http://www.scopus.com/inward/record.url?scp=85062179339&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85062179339&partnerID=8YFLogxK
U2 - 10.1109/CDC.2018.8619839
DO - 10.1109/CDC.2018.8619839
M3 - Conference contribution
AN - SCOPUS:85062179339
T3 - Proceedings of the IEEE Conference on Decision and Control
SP - 1967
EP - 1972
BT - 2018 IEEE Conference on Decision and Control, CDC 2018
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 17 December 2018 through 19 December 2018
ER -