Improved upper bounds on the expected error in constant step-size Q-learning

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We consider fixed step-size Q-learning algorithms applied to finite state and action space, discounted reward Markov decision problems (MDPs). In previous work we derived a bound on the first moment of the Q-value estimation error, specifically on the expected steady-state value of the infinity norm of the error. The goal in both this paper, and the previous, is to maximize a discounted sum of rewards over an infinite time horizon. However, in our previous work, the bound we derived holds only when the step-size is sufficiently, and sometimes impractically, small. In this paper, we present a new error bound that, as before, goes to zero as the step-size goes to zero, but is also valid for all values of the step-size. To obtain the new bound, we divide time into frames such that the probability that there is some state that is not visited within the frame is strictly less than 1: Our error bound is then found by sampling the system one time in every frame.

Original languageEnglish (US)
Title of host publication2013 American Control Conference, ACC 2013
Pages1926-1931
Number of pages6
StatePublished - 2013
Event2013 1st American Control Conference, ACC 2013 - Washington, DC, United States
Duration: Jun 17 2013Jun 19 2013

Other

Other2013 1st American Control Conference, ACC 2013
CountryUnited States
CityWashington, DC
Period6/17/136/19/13

ASJC Scopus subject areas

  • Electrical and Electronic Engineering

Fingerprint Dive into the research topics of 'Improved upper bounds on the expected error in constant step-size Q-learning'. Together they form a unique fingerprint.

  • Cite this

    Beck, C. L., & Srikant, R. (2013). Improved upper bounds on the expected error in constant step-size Q-learning. In 2013 American Control Conference, ACC 2013 (pp. 1926-1931). [6580117]