Expedited Learning in MDPs with Side Information

Melkior Ornik, Jie Fu, Niklas T. Lauffer, W. K. Perera, Mohammed Alshiekh, Masahiro Ono, Ufuk Topcu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Standard methods for synthesis of control policies in Markov decision processes with unknown transition probabilities largely rely on a combination of exploration and exploitation. While these methods often offer theoretical guarantees on system performance, the number of time steps and samples needed to initially explore the environment before synthesizing a well-performing control policy is impractically large. This paper partially alleviates such a burden by incorporating a priori existing knowledge into learning, when such knowledge is available. Based on prior information about bounds on the differences between the transition probabilities at different states, we propose a learning approach where the transition probabilities at a given state are not only learned from outcomes of repeatedly performing a certain action at that state, but also from outcomes of performing actions at states that are known to have similar transition probabilities. Since the directly obtained information is more reliable at determining transition probabilities than second-hand information, i.e., information obtained from similar but potentially slightly different states, samples obtained indirectly are weighted with respect to the known bounds on the differences of transition probabilities. While the proposed strategy can naturally lead to errors in learned transition probabilities, we show that, by proper choice of the weights, such errors can be reduced, and the number of steps needed to form a near-optimal control policy in the Bayesian sense can be significantly decreased.

Original languageEnglish (US)
Title of host publication2018 IEEE Conference on Decision and Control, CDC 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1941-1948
Number of pages8
ISBN (Electronic)9781538613955
DOIs
StatePublished - Jul 2 2018
Externally publishedYes
Event57th IEEE Conference on Decision and Control, CDC 2018 - Miami, United States
Duration: Dec 17 2018Dec 19 2018

Publication series

NameProceedings of the IEEE Conference on Decision and Control
Volume2018-December
ISSN (Print)0743-1546
ISSN (Electronic)2576-2370

Conference

Conference57th IEEE Conference on Decision and Control, CDC 2018
Country/TerritoryUnited States
CityMiami
Period12/17/1812/19/18

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Modeling and Simulation
  • Control and Optimization

Fingerprint

Dive into the research topics of 'Expedited Learning in MDPs with Side Information'. Together they form a unique fingerprint.

Cite this