Abstract
In this article, we study the dynamics of temporal-difference (TD) learning with neural network-based value function approximation over a general state space, namely, neural TD learning. We consider two practically used algorithms, projection-free and max-norm regularized neural TD learning, and establish the first convergence bounds for these algorithms. An interesting observation from our results is that max-norm regularization can dramatically improve the performance of TD learning algorithms in terms of sample complexity and overparameterization. The results in this work rely on a Lyapunov drift analysis of the network parameters as a stopped and controlled random process.
Original language | English (US) |
---|---|
Pages (from-to) | 2891-2905 |
Number of pages | 15 |
Journal | IEEE Transactions on Automatic Control |
Volume | 68 |
Issue number | 5 |
DOIs | |
State | Published - May 1 2023 |
Keywords
- Neural networks
- reinforcement learning (RL)
- stochastic approximation
- temporal-difference (TD) learning
ASJC Scopus subject areas
- Control and Systems Engineering
- Computer Science Applications
- Electrical and Electronic Engineering