Entropy Maximization for Constrained Markov Decision Processes

Yagiz Savas, Melkior Ornik, Murat Cubuktepe, Ufuk Topcu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We study the problem of synthesizing a policy that maximizes the entropy of a Markov decision process (MDP) subject to expected reward constraints. Such a policy minimizes the predictability of the paths it generates in an MDP while attaining certain reward thresholds. We first show that the maximum entropy of an MDP can be finite, infinite or unbounded. We provide necessary and sufficient conditions under which the maximum entropy of an MDP is finite, infinite or unbounded. We then present an algorithm to synthesize a policy that maximizes the entropy of an MDP. The proposed algorithm is based on a convex optimization problem and runs in time polynomial in the size of the MDP. Finally, we extend the algorithm to an MDP subject to expected total reward constraints. In numerical examples, we demonstrate the proposed method on different motion planning scenarios and illustrate the trade-off between the predictability of paths and the level of the collected reward.

Original languageEnglish (US)
Title of host publication2018 56th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages911-918
Number of pages8
ISBN (Electronic)9781538665961
DOIs
StatePublished - Jul 2 2018
Externally publishedYes
Event56th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2018 - Monticello, United States
Duration: Oct 2 2018Oct 5 2018

Publication series

Name2018 56th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2018

Conference

Conference56th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2018
Country/TerritoryUnited States
CityMonticello
Period10/2/1810/5/18

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Hardware and Architecture
  • Signal Processing
  • Energy Engineering and Power Technology
  • Control and Optimization

Fingerprint

Dive into the research topics of 'Entropy Maximization for Constrained Markov Decision Processes'. Together they form a unique fingerprint.

Cite this