Abstract
Deception is a useful tool in situations where an agent operates in the presence of its adversaries. We consider a setting where a supervisor provides a reference policy to an agent, expects the agent to operate in an environment by following the reference policy, and partially observes the agent's behavior. The agent instead follows a different deceptive policy to achieve a different task. We model the environment with a Markov decision process and study the synthesis of optimal deceptive policies under partial observability. We formalize the notion of deception as a hypothesis testing problem and show that the synthesis of optimal deceptive policies is nondeterministic polynomial-time hard (NP-hard). As an approximation, we consider the class of mixture policies, which provides a convex optimization formulation of the deception problem. We give an algorithm that converges to the optimal mixture policy. We also consider a special class of Markov decision processes where the transition and observation functions are deterministic. For this case, we give a randomized algorithm for path planning that generates a path for the agent in polynomial time and achieves the optimal value for the considered objective function.
Original language | English (US) |
---|---|
Pages (from-to) | 4443-4450 |
Number of pages | 8 |
Journal | IEEE Transactions on Automatic Control |
Volume | 68 |
Issue number | 7 |
DOIs | |
State | Published - Jul 1 2023 |
Keywords
- Computational complexity
- Markov decision processes (MDPs)
- deception under partial observations
ASJC Scopus subject areas
- Electrical and Electronic Engineering
- Control and Systems Engineering
- Computer Science Applications