TY - JOUR
T1 - Exploiting Partial Observability for Optimal Deception
AU - Karabag, Mustafa O.
AU - Ornik, Melkior
AU - Topcu, Ufuk
N1 - Funding Information:
This work was supported in part by the Office of Naval Research under Grant N00014-21-1-2502, in part by the Defense Advanced Research Projects Agency under Grant D19AP00004, and in part by the Air Force Research Laboratory under Grant FA9550-19-1-0169.
Publisher Copyright:
© 1963-2012 IEEE.
PY - 2023/7/1
Y1 - 2023/7/1
N2 - Deception is a useful tool in situations where an agent operates in the presence of its adversaries. We consider a setting where a supervisor provides a reference policy to an agent, expects the agent to operate in an environment by following the reference policy, and partially observes the agent's behavior. The agent instead follows a different deceptive policy to achieve a different task. We model the environment with a Markov decision process and study the synthesis of optimal deceptive policies under partial observability. We formalize the notion of deception as a hypothesis testing problem and show that the synthesis of optimal deceptive policies is nondeterministic polynomial-time hard (NP-hard). As an approximation, we consider the class of mixture policies, which provides a convex optimization formulation of the deception problem. We give an algorithm that converges to the optimal mixture policy. We also consider a special class of Markov decision processes where the transition and observation functions are deterministic. For this case, we give a randomized algorithm for path planning that generates a path for the agent in polynomial time and achieves the optimal value for the considered objective function.
AB - Deception is a useful tool in situations where an agent operates in the presence of its adversaries. We consider a setting where a supervisor provides a reference policy to an agent, expects the agent to operate in an environment by following the reference policy, and partially observes the agent's behavior. The agent instead follows a different deceptive policy to achieve a different task. We model the environment with a Markov decision process and study the synthesis of optimal deceptive policies under partial observability. We formalize the notion of deception as a hypothesis testing problem and show that the synthesis of optimal deceptive policies is nondeterministic polynomial-time hard (NP-hard). As an approximation, we consider the class of mixture policies, which provides a convex optimization formulation of the deception problem. We give an algorithm that converges to the optimal mixture policy. We also consider a special class of Markov decision processes where the transition and observation functions are deterministic. For this case, we give a randomized algorithm for path planning that generates a path for the agent in polynomial time and achieves the optimal value for the considered objective function.
KW - Computational complexity
KW - Markov decision processes (MDPs)
KW - deception under partial observations
UR - http://www.scopus.com/inward/record.url?scp=85139526028&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85139526028&partnerID=8YFLogxK
U2 - 10.1109/TAC.2022.3209959
DO - 10.1109/TAC.2022.3209959
M3 - Article
AN - SCOPUS:85139526028
SN - 0018-9286
VL - 68
SP - 4443
EP - 4450
JO - IEEE Transactions on Automatic Control
JF - IEEE Transactions on Automatic Control
IS - 7
ER -