TY - JOUR
T1 - Sample Complexity and Overparameterization Bounds for Temporal-Difference Learning With Neural Network Approximation
AU - Cayci, Semih
AU - Satpathi, Siddhartha
AU - He, Niao
AU - Srikant, R.
N1 - Funding Information:
This work was supported in part by the National Science Foundation under Grant CCF 22-07547, CCF 19-34986 and Grant CNS 21- 06801, in part by the Office of Naval Research under Grant N00014-19- 1-2566, and in part by the Swiss National Science Foundation Project Funding under Grant 200021-207343.
Publisher Copyright:
© 1963-2012 IEEE.
PY - 2023/5/1
Y1 - 2023/5/1
N2 - In this article, we study the dynamics of temporal-difference (TD) learning with neural network-based value function approximation over a general state space, namely, neural TD learning. We consider two practically used algorithms, projection-free and max-norm regularized neural TD learning, and establish the first convergence bounds for these algorithms. An interesting observation from our results is that max-norm regularization can dramatically improve the performance of TD learning algorithms in terms of sample complexity and overparameterization. The results in this work rely on a Lyapunov drift analysis of the network parameters as a stopped and controlled random process.
AB - In this article, we study the dynamics of temporal-difference (TD) learning with neural network-based value function approximation over a general state space, namely, neural TD learning. We consider two practically used algorithms, projection-free and max-norm regularized neural TD learning, and establish the first convergence bounds for these algorithms. An interesting observation from our results is that max-norm regularization can dramatically improve the performance of TD learning algorithms in terms of sample complexity and overparameterization. The results in this work rely on a Lyapunov drift analysis of the network parameters as a stopped and controlled random process.
KW - Neural networks
KW - reinforcement learning (RL)
KW - stochastic approximation
KW - temporal-difference (TD) learning
UR - http://www.scopus.com/inward/record.url?scp=85147223079&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85147223079&partnerID=8YFLogxK
U2 - 10.1109/TAC.2023.3234234
DO - 10.1109/TAC.2023.3234234
M3 - Article
AN - SCOPUS:85147223079
SN - 0018-9286
VL - 68
SP - 2891
EP - 2905
JO - IEEE Transactions on Automatic Control
JF - IEEE Transactions on Automatic Control
IS - 5
ER -