TY - JOUR
T1 - Communication-Efficient Policy Gradient Methods for Distributed Reinforcement Learning
AU - Chen, Tianyi
AU - Zhang, Kaiqing
AU - Giannakis, Georgios B.
AU - Basar, Tamer
N1 - Funding Information:
This work was supported in part by the National Science Foundation under Grant 1 509 040, Grant 1 508 993, and Grant 1 711 471, in part by the United States Army Research Laboratory under Grant W911NF-17- 2-0196, in part by the Rensselaer-IBM AI Research Collaboration, and in part by the IBM AI Horizons Network.
Publisher Copyright:
© 2014 IEEE.
PY - 2022/6/1
Y1 - 2022/6/1
N2 - This article deals with distributed policy optimization in reinforcement learning, which involves a central controller and a group of learners. In particular, two typical settings encountered in several applications are considered: multiagent reinforcement learning (RL) and parallel RL, where frequent information exchanges between the learners and the controller are required. For many practical distributed systems, however, the overhead caused by these frequent communication exchanges is considerable, and becomes the bottleneck of the overall performance. To address this challenge, a novel policy gradient approach is developed for solving distributed RL. The novel approach adaptively skips the policy gradient communication during iterations, and can reduce the communication overhead without degrading learning performance. It is established analytically that: i) the novel algorithm has a convergence rate identical to that of the plain-vanilla policy gradient; while ii) if the distributed learners are heterogeneous in terms of their reward functions, the number of communication rounds needed to achieve a desirable learning accuracy is markedly reduced. Numerical experiments corroborate the communication reduction attained by the novel algorithm compared to alternatives.
AB - This article deals with distributed policy optimization in reinforcement learning, which involves a central controller and a group of learners. In particular, two typical settings encountered in several applications are considered: multiagent reinforcement learning (RL) and parallel RL, where frequent information exchanges between the learners and the controller are required. For many practical distributed systems, however, the overhead caused by these frequent communication exchanges is considerable, and becomes the bottleneck of the overall performance. To address this challenge, a novel policy gradient approach is developed for solving distributed RL. The novel approach adaptively skips the policy gradient communication during iterations, and can reduce the communication overhead without degrading learning performance. It is established analytically that: i) the novel algorithm has a convergence rate identical to that of the plain-vanilla policy gradient; while ii) if the distributed learners are heterogeneous in terms of their reward functions, the number of communication rounds needed to achieve a desirable learning accuracy is markedly reduced. Numerical experiments corroborate the communication reduction attained by the novel algorithm compared to alternatives.
KW - Aerospace electronics
KW - Approximation algorithms
KW - Collaboration
KW - Convergence
KW - Reinforcement learning
KW - Task analysis
KW - Trajectory
KW - communication-efficient learning
KW - distributed learning
KW - multi-agent
KW - policy gradient
KW - reinforcement learning
KW - Communication-efficient learning
KW - multiagent
UR - http://www.scopus.com/inward/record.url?scp=85105873014&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85105873014&partnerID=8YFLogxK
U2 - 10.1109/TCNS.2021.3078100
DO - 10.1109/TCNS.2021.3078100
M3 - Article
AN - SCOPUS:85105873014
SN - 2325-5870
VL - 9
SP - 917
EP - 929
JO - IEEE Transactions on Control of Network Systems
JF - IEEE Transactions on Control of Network Systems
IS - 2
ER -