TY - GEN
T1 - Secure Linear Quadratic Regulator Using Sparse Model-Free Reinforcement Learning
AU - Kiumarsi, Bahare
AU - Basar, Tamer
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/12
Y1 - 2019/12
N2 - This paper presents a resilient model-free reinforcement learning solution to linear quadratic regulator control of cyber-physical systems under sensor attacks. To guarantee resiliency to sensor attacks, a sparse least-squares optimization is introduced to solve the Bellman equation. While the Bellman equation does not involve any dynamics, it implicitly solves a Lyapunov equation which depends on the system dynamics. Thus, if the data are corrupted and do not follow the dynamics, that causes an error in the Bellman equation. Therefore, assuming a strong system observability, i.e., s-sparse observability, the proposed sparse optimization assures that the data from compromised sensors that lead to a sizable error in the Bellman equation have no effect in reconstructing the state of the system, and, thus on evaluation of the policy. That is, only sensory outputs that result in a small error in the Bellman equation affect the policy evaluation. Once the optimal control policy is found, it can be applied to the system, until a surprise signal depending on the Bellman error is activated to indicate a change caused by a new attack or a change in the system dynamics.
AB - This paper presents a resilient model-free reinforcement learning solution to linear quadratic regulator control of cyber-physical systems under sensor attacks. To guarantee resiliency to sensor attacks, a sparse least-squares optimization is introduced to solve the Bellman equation. While the Bellman equation does not involve any dynamics, it implicitly solves a Lyapunov equation which depends on the system dynamics. Thus, if the data are corrupted and do not follow the dynamics, that causes an error in the Bellman equation. Therefore, assuming a strong system observability, i.e., s-sparse observability, the proposed sparse optimization assures that the data from compromised sensors that lead to a sizable error in the Bellman equation have no effect in reconstructing the state of the system, and, thus on evaluation of the policy. That is, only sensory outputs that result in a small error in the Bellman equation affect the policy evaluation. Once the optimal control policy is found, it can be applied to the system, until a surprise signal depending on the Bellman error is activated to indicate a change caused by a new attack or a change in the system dynamics.
KW - Linear quadratic regulator
KW - Reinforcement Learning
KW - Resilient control
UR - http://www.scopus.com/inward/record.url?scp=85082445654&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85082445654&partnerID=8YFLogxK
U2 - 10.1109/CDC40024.2019.9028861
DO - 10.1109/CDC40024.2019.9028861
M3 - Conference contribution
AN - SCOPUS:85082445654
T3 - Proceedings of the IEEE Conference on Decision and Control
SP - 3641
EP - 3647
BT - 2019 IEEE 58th Conference on Decision and Control, CDC 2019
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 58th IEEE Conference on Decision and Control, CDC 2019
Y2 - 11 December 2019 through 13 December 2019
ER -