Policy Optimization for Markovian Jump Linear Quadratic Control: Gradient Method and Global Convergence

Joao Paulo Jansch-Porto, Bin Hu, Geir E. Dullerud

Research output: Contribution to journalArticlepeer-review


Recently, policy optimization has received renewed attention from the control community due to various applications in reinforcement learning tasks. In this paper, we investigate the global convergence of the gradient method for quadratic optimal control of discrete-time Markovian jump linear systems (MJLS). First, we study the optimization landscape of direct policy optimization for MJLS, with static state feedback controllers and quadratic performance costs. Despite the non-convexity of the resultant problem, we are still able to identify several useful properties such as coercivity, gradient dominance, and smoothness. Based on these properties, we prove that the gradient method converges to the optimal state feedback controller for MJLS at a linear rate if initialized at a controller which is mean-square stabilizing. This work brings new insights for understanding the performance of the policy gradient method on the Markovian jump linear quadratic control problem.

Original languageEnglish (US)
Pages (from-to)1
Number of pages1
JournalIEEE Transactions on Automatic Control
StateAccepted/In press - 2022


  • Convergence
  • Costs
  • Gradient methods
  • Linear systems
  • Markov processes
  • Markovian jump linear systems
  • Optimization
  • State feedback
  • optimal control
  • policy gradient methods
  • reinforcement learning

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Computer Science Applications
  • Electrical and Electronic Engineering


Dive into the research topics of 'Policy Optimization for Markovian Jump Linear Quadratic Control: Gradient Method and Global Convergence'. Together they form a unique fingerprint.

Cite this