Expedited Online Learning With Spatial Side Information

Pranay Thangeda, Melkior Ornik, Ufuk Topcu

Research output: Contribution to journalArticlepeer-review

Abstract

The applicability of model-based online reinforcement learning algorithms is often limited by the amount of exploration required for learning the environment model to the desired level of accuracy. A promising approach to addressing this issue is to exploit side information, available either a priori or during the agent's mission, for learning the unknown dynamics. Side information in our context refers to information in the form of bounds on the differences between transition probabilities at different states in the environment. We use this information as a measure of reusability of the direct experience gained by performing actions and observing the outcomes at different states. We propose a framework to integrate side information into existing model-based reinforcement learning algorithms by complementing the samples obtained directly at states with second-hand information obtained from other states with similar dynamics. Additionally, we propose an algorithm for synthesizing the optimal control strategy in unknown environments by using side information to effectively balance between exploration and exploitation. We prove that, with high probability, the proposed algorithm yields a near-optimal policy in the Bayesian sense, while also guaranteeing the safety of the agent during exploration. We obtain the near-optimal policy in time steps that are polynomial in terms of the parameters describing the model. We illustrate the utility of the proposed algorithms in a setting of a Mars rover, with data from onboard sensors and a companion aerial vehicle acting as the side information.

Original languageEnglish (US)
Pages (from-to)1479-1491
Number of pages13
JournalIEEE Transactions on Automatic Control
Volume68
Issue number3
DOIs
StatePublished - Mar 1 2023

Keywords

  • Markov decision processes (MDPs)
  • online learning
  • planning
  • side information

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Control and Systems Engineering
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Expedited Online Learning With Spatial Side Information'. Together they form a unique fingerprint.

Cite this