Exploiting Partial Observability for Optimal Deception

Mustafa O. Karabag, Melkior Ornik, Ufuk Topcu

Research output: Contribution to journalArticlepeer-review

Abstract

Deception is a useful tool in situations where an agent operates in the presence of its adversaries. We consider a setting where a supervisor provides a reference policy to an agent, expects the agent to operate in an environment by following the reference policy, and partially observes the agent's behavior. The agent instead follows a different deceptive policy to achieve a different task. We model the environment with a Markov decision process and study the synthesis of optimal deceptive policies under partial observability. We formalize the notion of deception as a hypothesis testing problem and show that the synthesis of optimal deceptive policies is nondeterministic polynomial-time hard (NP-hard). As an approximation, we consider the class of mixture policies, which provides a convex optimization formulation of the deception problem. We give an algorithm that converges to the optimal mixture policy. We also consider a special class of Markov decision processes where the transition and observation functions are deterministic. For this case, we give a randomized algorithm for path planning that generates a path for the agent in polynomial time and achieves the optimal value for the considered objective function.

Original languageEnglish (US)
Pages (from-to)4443-4450
Number of pages8
JournalIEEE Transactions on Automatic Control
Volume68
Issue number7
DOIs
StatePublished - Jul 1 2023

Keywords

  • Computational complexity
  • Markov decision processes (MDPs)
  • deception under partial observations

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Control and Systems Engineering
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Exploiting Partial Observability for Optimal Deception'. Together they form a unique fingerprint.

Cite this