Abstract
This paper develops a novel and unified framework to analyze the convergence of a large family of Q-learning algorithms from the switching system perspective. We show that the nonlinear ODE models associated with Q-learning and many of its variants can be naturally formulated as affine switching systems. Building on their asymptotic stability, we obtain a number of interesting results: (i) we provide a simple ODE analysis for the convergence of asynchronous Q-learning under relatively weak assumptions; (ii) we establish the first convergence analysis of the averaging Q-learning algorithm, and (iii) we derive a new sufficient condition for the convergence of Q-learning with linear function approximation.
Original language | English (US) |
---|---|
Journal | Advances in Neural Information Processing Systems |
Volume | 2020-December |
State | Published - 2020 |
Event | 34th Conference on Neural Information Processing Systems, NeurIPS 2020 - Virtual, Online Duration: Dec 6 2020 → Dec 12 2020 |
ASJC Scopus subject areas
- Computer Networks and Communications
- Information Systems
- Signal Processing