Dynamic parameters in Sequential Decision Making

Amber Srivastava, Srinivasa M. Salapaka

Research output: Contribution to journalArticlepeer-review


Sequential Decision Making (SDM) problems optimize over the sequence of actions (or, decisions) taken to minimize the underlying cumulative cost. These sequence of actions are referred to as the policy of the SDM. Often these problems comprise of additional (fixed and manipulable) parameters; and the objective is to determine the optimal policy as well as the manipulable parameters that minimizes the SDM cost. In this paper we address the class of SDM problems that are characterized by dynamic parameters; where the dynamics is pre-specified for a subset of parameters and manipulable for others. The objective is to determine the manipulable parameter dynamics as well as the time-varying policy such that the associated SDM cost gets minimized at each time instant. To this end, we develop a control-theoretic framework to design the manipulable parameter dynamics such that it tracks the optimal values of the parameters, and simultaneously determines the time-varying optimal policy. Our methodology builds upon a Maximum Entropy Principle (MEP) based framework that addresses SDMs. More precisely, the above framework results into a smooth approximation of the SDM cost which we utilize as a control Lyapunov function. We show that under the resulting control law the parameters asymptotically track the local optimal, the proposed control law is Lipschitz continuous and bounded, and the policy of the SDM is optimal for a given set of parameter values. The simulations demonstrate the efficacy of our proposed methodology.

Original languageEnglish (US)
Article number110795
StatePublished - Feb 2023
Externally publishedYes


  • Markov decision processes
  • Maximum Entropy Principle
  • Parameterized state and action spaces

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Electrical and Electronic Engineering


Dive into the research topics of 'Dynamic parameters in Sequential Decision Making'. Together they form a unique fingerprint.

Cite this