Sample Complexity of Reinforcement Learning using Linearly Combined Model Ensembles

Aditya Modi, Nan Jiang, Ambuj Tewari, Satinder Singh

Research output: Contribution to journalConference articlepeer-review


Reinforcement learning (RL) methods have been shown to be capable of learning intelligent behavior in rich domains. However, this has largely been done in simulated domains without adequate focus on the process of building the simulator. In this paper, we consider a setting where we have access to an ensemble of pre-trained and possibly inaccurate simulators (models). We approximate the real environment using a state-dependent linear combination of the ensemble, where the coefficients are determined by the given state features and some unknown parameters. Our proposed algorithm provably learns a near-optimal policy with a sample complexity polynomial in the number of unknown parameters, and incurs no dependence on the size of the state (or action) space. As an extension, we also consider the more challenging problem of model selection, where the state features are unknown and can be chosen from a large candidate set. We provide exponential lower bounds that illustrate the fundamental hardness of this problem, and develop a provably efficient algorithm under additional natural assumptions.

Original languageEnglish (US)
Pages (from-to)2010-2020
Number of pages11
JournalProceedings of Machine Learning Research
StatePublished - 2020
Event23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020 - Virtual, Online
Duration: Aug 26 2020Aug 28 2020

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Statistics and Probability


Dive into the research topics of 'Sample Complexity of Reinforcement Learning using Linearly Combined Model Ensembles'. Together they form a unique fingerprint.

Cite this