Sample Complexity and Overparameterization Bounds for Temporal-Difference Learning With Neural Network Approximation

Semih Cayci, Siddhartha Satpathi, Niao He, R. Srikant

Research output: Contribution to journalArticlepeer-review

Abstract

In this article, we study the dynamics of temporal-difference (TD) learning with neural network-based value function approximation over a general state space, namely, neural TD learning. We consider two practically used algorithms, projection-free and max-norm regularized neural TD learning, and establish the first convergence bounds for these algorithms. An interesting observation from our results is that max-norm regularization can dramatically improve the performance of TD learning algorithms in terms of sample complexity and overparameterization. The results in this work rely on a Lyapunov drift analysis of the network parameters as a stopped and controlled random process.

Original languageEnglish (US)
Pages (from-to)2891-2905
Number of pages15
JournalIEEE Transactions on Automatic Control
Volume68
Issue number5
DOIs
StatePublished - May 1 2023

Keywords

  • Neural networks
  • reinforcement learning (RL)
  • stochastic approximation
  • temporal-difference (TD) learning

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Computer Science Applications
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Sample Complexity and Overparameterization Bounds for Temporal-Difference Learning With Neural Network Approximation'. Together they form a unique fingerprint.

Cite this