Safety-Guaranteed, Accelerated Learning in MDPs with Local Side Information

Pranay Thangeda, Melkior Ornik

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In environments with uncertain dynamics, synthesis of optimal control policies mandates exploration. The applicability of classical learning algorithms to real-world problems is often limited by the number of time steps required for learning the environment model. Given some local side information about the differences in transition probabilities of the states, potentially obtained from the agent's onboard sensors, we generalize the idea of indirect sampling for accelerated learning to propose an algorithm that balances between exploration and exploitation. We formalize this idea by introducing the notion of the value of information in the context of a Markov decision process with unknown transition probabilities, as a measure of the expected improvement in the agent's current estimate of transition probabilities by taking a particular action. By exploiting available local side information and maximizing the estimated value of learned information at each time step, we accelerate the learning process and subsequent synthesis of the optimal control policy. Further, we define the notion of agent safety, a vital consideration for physical systems, in the context of our problem. Under certain assumptions, we provide guarantees on the safety of an agent exploring with our algorithm that exploits local side information. We illustrate agent safety and the improvement in learning speed using numerical experiments in the setting of a Mars rover, with data from onboard sensors acting as the local side information.

Original languageEnglish (US)
Title of host publication2020 American Control Conference, ACC 2020
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1099-1104
Number of pages6
ISBN (Electronic)9781538682661
DOIs
StatePublished - Jul 2020
Event2020 American Control Conference, ACC 2020 - Denver, United States
Duration: Jul 1 2020Jul 3 2020

Publication series

NameProceedings of the American Control Conference
Volume2020-July
ISSN (Print)0743-1619

Conference

Conference2020 American Control Conference, ACC 2020
Country/TerritoryUnited States
CityDenver
Period7/1/207/3/20

ASJC Scopus subject areas

  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Safety-Guaranteed, Accelerated Learning in MDPs with Local Side Information'. Together they form a unique fingerprint.

Cite this