Off-policy reinforcement learning with Gaussian processes

Girish Chowdhary, Miao Liu, Robert Grande, Thomas Walsh, Jonathan How, Lawrence Carin

Research output: Contribution to journalArticlepeer-review


An off-policy Bayesian nonparameteric approximate reinforcement learning framework, termed as GPQ, that employs a Gaussian processes (GP) model of the value (Q) function is presented in both the batch and online settings. Sufficient conditions on GP hyperparameter selection are established to guarantee convergence of off-policy GPQ in the batch setting, and theoretical and practical extensions are provided for the online case. Empirical results demonstrate GPQ has competitive learning speed in addition to its convergence guarantees and its ability to automatically choose its own bases locations.

Original languageEnglish (US)
Article number7004680
Pages (from-to)227-238
Number of pages12
JournalIEEE/CAA Journal of Automatica Sinica
Issue number3
StatePublished - Jul 1 2014
Externally publishedYes


  • Bayesian nonparametric
  • Gaussian processes
  • Reinforcement learning
  • off-policy learning

ASJC Scopus subject areas

  • Control and Optimization
  • Artificial Intelligence
  • Information Systems
  • Control and Systems Engineering


Dive into the research topics of 'Off-policy reinforcement learning with Gaussian processes'. Together they form a unique fingerprint.

Cite this