We formulate and solve a dynamic stochastic optimization problem of a nonstandard type, whose optimal solution features active learning. The proof of optimality and the derivation of the corresponding control policies is an indirect one, which relates the original single-person optimization problem to a sequence of nested zero-sum stochastic games. Existence of saddle points for these games implies the existence of optimal policies for the original stochastic control problem, which, in turn, can be obtained from the solution of a nonlinear deterministic optimal control problem. The paper also studies the problem of existence of stationary optimal policies when the time horizon is infinite and the objective function is discounted.
ASJC Scopus subject areas
- Control and Systems Engineering
- Computer Science Applications
- Electrical and Electronic Engineering