TY - GEN
T1 - Stochastic primal-dual Q-learning algorithm for discounted mdps
AU - Lee, Donghwan
AU - He, Niao
N1 - Funding Information:
This material is based upon work supported by the National Science Foundation under Grant No.1755829.
Publisher Copyright:
© 2019 American Automatic Control Council.
PY - 2019/7
Y1 - 2019/7
N2 - In this work, we present a new model-free and off-policy reinforcement learning (RL) algorithm, that is capable of finding a near-optimal policy with state-action observations from arbitrary behavior policies. Our algorithm, called the stochastic primal-dual Q-learning (SPD Q-learning), hinges upon a new linear programming formulation and a dual perspective of the standard Q-learning. In contrast to previous primal-dual RL algorithms, SPD-Q learning includes a Q-function estimation step, thus allowing to recover an approximate policy from the primal solution as well as the dual solution. We prove a first-of-its-kind result that the SPD Q-learning guarantees a certain convergence rate, even when the state-action distribution under a given behavior policy is time-varying but sub-linearly converges to a stationary distribution.
AB - In this work, we present a new model-free and off-policy reinforcement learning (RL) algorithm, that is capable of finding a near-optimal policy with state-action observations from arbitrary behavior policies. Our algorithm, called the stochastic primal-dual Q-learning (SPD Q-learning), hinges upon a new linear programming formulation and a dual perspective of the standard Q-learning. In contrast to previous primal-dual RL algorithms, SPD-Q learning includes a Q-function estimation step, thus allowing to recover an approximate policy from the primal solution as well as the dual solution. We prove a first-of-its-kind result that the SPD Q-learning guarantees a certain convergence rate, even when the state-action distribution under a given behavior policy is time-varying but sub-linearly converges to a stationary distribution.
UR - http://www.scopus.com/inward/record.url?scp=85072302869&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85072302869&partnerID=8YFLogxK
U2 - 10.23919/acc.2019.8815275
DO - 10.23919/acc.2019.8815275
M3 - Conference contribution
AN - SCOPUS:85072302869
T3 - Proceedings of the American Control Conference
SP - 4897
EP - 4902
BT - 2019 American Control Conference, ACC 2019
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2019 American Control Conference, ACC 2019
Y2 - 10 July 2019 through 12 July 2019
ER -