Semiparametric Information State Embedding for Policy Search under Imperfect Information

Sujay Bhatt, Weichao Mao, Alec Koppel, Tamer Basar

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We consider the problem of policy search in sequential decision making problems with imperfect information as encapsulated by a partially observed Markov Decision Process (POMDP) over possibly continuous state-spaces. In general, the optimal policy is history-dependent and the objective is non-convex in the policy parameters, making even stationary point policies challenging to ascertain. To address this problem class, we develop a constructive way to succinctly represent the history as an approximate information state, using Semiparametric Information State Embedding (SISE). SISE alternates between conditional kernel density estimation and fitting the parameters of an Echo State Network (ESN), a one-layer recurrent neural model. Based upon constructing SISE, we develop an actor-critic scheme for policy search over the approximate information states. Our main technical contributions are to (i) establish the convergence and generalization performance of SISE, and (ii) derive the convergence to stationary points of our policy search scheme. Experimentally, our fusion of SISE and actor-critic yields favorable performance in practice on the canonical POMDPs of Tiger, LightDark, and a partially observed variant of CartPole.

Original languageEnglish (US)
Title of host publication60th IEEE Conference on Decision and Control, CDC 2021
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages4501-4506
Number of pages6
ISBN (Electronic)9781665436595
DOIs
StatePublished - 2021
Event60th IEEE Conference on Decision and Control, CDC 2021 - Austin, United States
Duration: Dec 13 2021Dec 17 2021

Publication series

NameProceedings of the IEEE Conference on Decision and Control
Volume2021-December
ISSN (Print)0743-1546
ISSN (Electronic)2576-2370

Conference

Conference60th IEEE Conference on Decision and Control, CDC 2021
Country/TerritoryUnited States
CityAustin
Period12/13/2112/17/21

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Modeling and Simulation
  • Control and Optimization

Fingerprint

Dive into the research topics of 'Semiparametric Information State Embedding for Policy Search under Imperfect Information'. Together they form a unique fingerprint.

Cite this