Abstract
We consider the control of a Markov decision process (MDP) that undergoes an abrupt change in its transition kernel (mode). We formulate the problem of minimizing regret under control switching based on mode change detection, compared to a mode-observing controller, as an optimal stopping problem. Using a sequence of approximations, we reduce it to a quickest change detection (QCD) problem with Markovian data, for which we characterize a state-dependent threshold-type optimal change detection policy. Numerical experiments illustrate various properties of our control-switching policy.
Original language | English (US) |
---|---|
Pages (from-to) | 325-345 |
Number of pages | 21 |
Journal | Applied and Computational Mathematics |
Volume | 23 |
Issue number | 3 Special Issue |
DOIs | |
State | Published - 2024 |
Keywords
- Change Detection
- Detection Policy
- Optimal Change
- Optimal Stopping
- Piecewise Stationary Environment
- Sequence Approximations
- Switched Markov Decision Process
ASJC Scopus subject areas
- Computational Mathematics
- Applied Mathematics