TY - JOUR
T1 - Software Fault Tolerance for Cyber-Physical Systems via Full System Restart
AU - Jagtap, Pushpak
AU - Abdi, Fardin
AU - Rungger, Matthias
AU - Zamani, Majid
AU - Caccamo, Marco
N1 - This work was supported in part by the H2020 ERC Starting Grant AutoCPS (grant agreement No 804639), the German Research Foundation (DFG) through the grant ZA 873/1-1, and the TUM International Graduate School of Science and Engineering (IGSSE). The material presented in this article is also based upon work supported by the National Science Foundation (NSF) under grant number CNS-1646383. Marco Caccamo was also supported by an Alexander von Humboldt Professorship endowed by the German Federal Ministry of Education and Research. Authors’ addresses: P. Jagtap and M. Rungger, Department of Electrical and Computer Engineering, Technical University of Munich, Arcistrasse 21, Germany; emails: [email protected], [email protected]; F. Abdi, Uber, Seattle, USA; email: [email protected]; M. Zamani, Computer Science Department, University of Colorado Boulder, 1111 Engineering Drive, Boulder, CO 80309-0430 USA; email: [email protected]; M. Caccamo, Department of Mechanical Engineering, Technical University of Munich, Boltzmannstrasse 15, 85748, Germany; email: [email protected]. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. © 2020 Association for Computing Machinery. 2378-962X/2020/08-ART47 $15.00 https://doi.org/10.1145/3407183
PY - 2020/8
Y1 - 2020/8
N2 - The article addresses the issue of reliability of complex embedded control systems in the safety-critical environment. In this article, we propose a novel approach to design controller that (i) guarantees the safety of nonlinear physical systems, (ii) enables safe system restart during runtime, and (iii) allows the use of complex, unverified controllers (e.g., neural networks) that drive the physical systems toward complex specifications. We use abstraction-based controller synthesis approach to design a formally verified controller that provides application and system-level fault tolerance along with safety guarantee. Moreover, our approach is implementable using a commercial-off-the-shelf (COTS) processing unit. To demonstrate the efficacy of our solution and to verify the safety of the system under various types of faults injected in applications and in the underlying real-time operating system (RTOS), we implemented the proposed controller for the inverted pendulum and three degrees-of-freedom (3-DOF) helicopter.
AB - The article addresses the issue of reliability of complex embedded control systems in the safety-critical environment. In this article, we propose a novel approach to design controller that (i) guarantees the safety of nonlinear physical systems, (ii) enables safe system restart during runtime, and (iii) allows the use of complex, unverified controllers (e.g., neural networks) that drive the physical systems toward complex specifications. We use abstraction-based controller synthesis approach to design a formally verified controller that provides application and system-level fault tolerance along with safety guarantee. Moreover, our approach is implementable using a commercial-off-the-shelf (COTS) processing unit. To demonstrate the efficacy of our solution and to verify the safety of the system under various types of faults injected in applications and in the underlying real-time operating system (RTOS), we implemented the proposed controller for the inverted pendulum and three degrees-of-freedom (3-DOF) helicopter.
KW - Cyber-physical systems
KW - abstraction-based control
KW - fault-tolerance
KW - full system restart
KW - nonlinear systems
UR - http://www.scopus.com/inward/record.url?scp=85095976493&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85095976493&partnerID=8YFLogxK
U2 - 10.1145/3407183
DO - 10.1145/3407183
M3 - Article
AN - SCOPUS:85095976493
SN - 2378-962X
VL - 4
JO - ACM Transactions on Cyber-Physical Systems
JF - ACM Transactions on Cyber-Physical Systems
IS - 4
M1 - 47
ER -