TY - GEN
T1 - Convergence of Monte Carlo Exploring Starts with TD-Learning
AU - Winnicki, Anna
AU - Srikant, R.
N1 - This research is supported in part by ONR Grant N00014- 19-1-2566, and NSF Grants CNS 23-12714, CNS 21-06801, and CCF 22-07547.
PY - 2024
Y1 - 2024
N2 - The use of TD-learning has been widely employed in reinforcement learning algorithms due to its efficiency and practicality. Herein, we study the convergence of a variant of Monte Carlo Exploring Starts when operatornameTD(λ) is used in policy evaluation and policy improvement, and lookahead is used in the policy improvement step. Our results provide a threshold for the amount of lookahead that ensures convergence of Monte Carlo Exploring Starts with T D(λ) as a function of λ in[0,1].
AB - The use of TD-learning has been widely employed in reinforcement learning algorithms due to its efficiency and practicality. Herein, we study the convergence of a variant of Monte Carlo Exploring Starts when operatornameTD(λ) is used in policy evaluation and policy improvement, and lookahead is used in the policy improvement step. Our results provide a threshold for the amount of lookahead that ensures convergence of Monte Carlo Exploring Starts with T D(λ) as a function of λ in[0,1].
UR - http://www.scopus.com/inward/record.url?scp=86000549773&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=86000549773&partnerID=8YFLogxK
U2 - 10.1109/CDC56724.2024.10886885
DO - 10.1109/CDC56724.2024.10886885
M3 - Conference contribution
AN - SCOPUS:86000549773
T3 - Proceedings of the IEEE Conference on Decision and Control
SP - 3865
EP - 3870
BT - 2024 IEEE 63rd Conference on Decision and Control, CDC 2024
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 63rd IEEE Conference on Decision and Control, CDC 2024
Y2 - 16 December 2024 through 19 December 2024
ER -