TY - GEN
T1 - Reinforcement Learning with Unbiased Policy Evaluation and Linear Function Approximation
AU - Winnicki, Anna
AU - Srikant, R.
N1 - ACKNOWLEDGMENT The research presented here was supported by the following grants: ONR N00014-19-1-2566, NSF CCF 17-04970, NSF CCF 1934986, and ARO W911NF-19-1-0379.
PY - 2022
Y1 - 2022
N2 - We provide performance guarantees for a variant of simulation-based policy iteration for controlling Markov decision processes that involves the use of stochastic approximation algorithms along with state-of-the-art techniques that are useful for very large MDPs, including lookahead, function approximation, and gradient descent. Specifically, we analyze two algorithms; the first algorithm involves a least squares approach where a new set of weights associated with feature vectors is obtained via least squares minimization at each iteration and the second algorithm is a two-time-scale algorithm taking several steps of gradient descent towards the least squares solution before obtaining the next iterate using a stochastic approximation algorithm.
AB - We provide performance guarantees for a variant of simulation-based policy iteration for controlling Markov decision processes that involves the use of stochastic approximation algorithms along with state-of-the-art techniques that are useful for very large MDPs, including lookahead, function approximation, and gradient descent. Specifically, we analyze two algorithms; the first algorithm involves a least squares approach where a new set of weights associated with feature vectors is obtained via least squares minimization at each iteration and the second algorithm is a two-time-scale algorithm taking several steps of gradient descent towards the least squares solution before obtaining the next iterate using a stochastic approximation algorithm.
UR - http://www.scopus.com/inward/record.url?scp=85147037519&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85147037519&partnerID=8YFLogxK
U2 - 10.1109/CDC51059.2022.9992427
DO - 10.1109/CDC51059.2022.9992427
M3 - Conference contribution
AN - SCOPUS:85147037519
T3 - Proceedings of the IEEE Conference on Decision and Control
SP - 801
EP - 806
BT - 2022 IEEE 61st Conference on Decision and Control, CDC 2022
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 61st IEEE Conference on Decision and Control, CDC 2022
Y2 - 6 December 2022 through 9 December 2022
ER -