TY - GEN
T1 - Least inferable policies for markov decision processes
AU - Karabag, Mustafa O.
AU - Ornik, Melkior
AU - Topcu, Ufuk
N1 - Funding Information:
1Department of Electrical and Computer Engineering, University of Texas at Austin, USA. e-mail: karabag@utexas.edu 2Department of Aerospace Engineering and Coordinated Science Laboratory, University of Illinois at Urbana-Champaign. e-mail:mornik@illinois.edu 3Department of Aerospace Engineering and Engineering Mechanics and Institute for Computational Engineering and Sciences, University of Texas at Austin. e-mail:utopcu@utexas.edu ∗This work was performed while Melkior Ornik was with the Institute for Computational Engineering and Sciences, University of Texas at Austin. This work was supported in part by DARPA W911NF-16-1-0001, DARPA D19AP00004, and NSF 1652113.
Publisher Copyright:
© 2019 American Automatic Control Council.
PY - 2019/7
Y1 - 2019/7
N2 - In a variety of applications, an agent's success depends on the knowledge that an adversarial observer has or can gather about the agent's decisions. It is therefore desirable for the agent to achieve a task while reducing the ability of an observer to infer the agent's policy. We consider the task of the agent as a reachability problem in a Markov decision process and study the synthesis of policies that minimize the observer's ability to infer the transition probabilities of the agent between the states of the Markov decision process. We introduce a metric that is based on the Fisher information as a proxy for the information leaked to the observer and using this metric formulate a problem that minimizes expected total information subject to the reachability constraint. We proceed to solve the problem using convex optimization methods. To verify the proposed method, we analyze the relationship between the expected total information and the estimation error of the observer, and show that, for a particular class of Markov decision processes, these two values are inversely proportional.
AB - In a variety of applications, an agent's success depends on the knowledge that an adversarial observer has or can gather about the agent's decisions. It is therefore desirable for the agent to achieve a task while reducing the ability of an observer to infer the agent's policy. We consider the task of the agent as a reachability problem in a Markov decision process and study the synthesis of policies that minimize the observer's ability to infer the transition probabilities of the agent between the states of the Markov decision process. We introduce a metric that is based on the Fisher information as a proxy for the information leaked to the observer and using this metric formulate a problem that minimizes expected total information subject to the reachability constraint. We proceed to solve the problem using convex optimization methods. To verify the proposed method, we analyze the relationship between the expected total information and the estimation error of the observer, and show that, for a particular class of Markov decision processes, these two values are inversely proportional.
UR - http://www.scopus.com/inward/record.url?scp=85072277854&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85072277854&partnerID=8YFLogxK
U2 - 10.23919/acc.2019.8815129
DO - 10.23919/acc.2019.8815129
M3 - Conference contribution
AN - SCOPUS:85072277854
T3 - Proceedings of the American Control Conference
SP - 1224
EP - 1231
BT - 2019 American Control Conference, ACC 2019
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2019 American Control Conference, ACC 2019
Y2 - 10 July 2019 through 12 July 2019
ER -