TY - GEN
T1 - Semiparametric Information State Embedding for Policy Search under Imperfect Information
AU - Bhatt, Sujay
AU - Mao, Weichao
AU - Koppel, Alec
AU - Basar, Tamer
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021
Y1 - 2021
N2 - We consider the problem of policy search in sequential decision making problems with imperfect information as encapsulated by a partially observed Markov Decision Process (POMDP) over possibly continuous state-spaces. In general, the optimal policy is history-dependent and the objective is non-convex in the policy parameters, making even stationary point policies challenging to ascertain. To address this problem class, we develop a constructive way to succinctly represent the history as an approximate information state, using Semiparametric Information State Embedding (SISE). SISE alternates between conditional kernel density estimation and fitting the parameters of an Echo State Network (ESN), a one-layer recurrent neural model. Based upon constructing SISE, we develop an actor-critic scheme for policy search over the approximate information states. Our main technical contributions are to (i) establish the convergence and generalization performance of SISE, and (ii) derive the convergence to stationary points of our policy search scheme. Experimentally, our fusion of SISE and actor-critic yields favorable performance in practice on the canonical POMDPs of Tiger, LightDark, and a partially observed variant of CartPole.
AB - We consider the problem of policy search in sequential decision making problems with imperfect information as encapsulated by a partially observed Markov Decision Process (POMDP) over possibly continuous state-spaces. In general, the optimal policy is history-dependent and the objective is non-convex in the policy parameters, making even stationary point policies challenging to ascertain. To address this problem class, we develop a constructive way to succinctly represent the history as an approximate information state, using Semiparametric Information State Embedding (SISE). SISE alternates between conditional kernel density estimation and fitting the parameters of an Echo State Network (ESN), a one-layer recurrent neural model. Based upon constructing SISE, we develop an actor-critic scheme for policy search over the approximate information states. Our main technical contributions are to (i) establish the convergence and generalization performance of SISE, and (ii) derive the convergence to stationary points of our policy search scheme. Experimentally, our fusion of SISE and actor-critic yields favorable performance in practice on the canonical POMDPs of Tiger, LightDark, and a partially observed variant of CartPole.
UR - http://www.scopus.com/inward/record.url?scp=85126063208&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85126063208&partnerID=8YFLogxK
U2 - 10.1109/CDC45484.2021.9682964
DO - 10.1109/CDC45484.2021.9682964
M3 - Conference contribution
AN - SCOPUS:85126063208
T3 - Proceedings of the IEEE Conference on Decision and Control
SP - 4501
EP - 4506
BT - 60th IEEE Conference on Decision and Control, CDC 2021
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 60th IEEE Conference on Decision and Control, CDC 2021
Y2 - 13 December 2021 through 17 December 2021
ER -