TY - GEN
T1 - Application and system-level software fault tolerance through full system restarts
AU - Abdi, Fardin
AU - Tabish, Rohan
AU - Rungger, Matthias
AU - Zamani, Majid
AU - Caccamo, Marco
N1 - Funding Information:
This work is supported by the German Research Foundation (DFG) through the grant ZA 873/1-1, the TUM International Graduate School of Science and Engineering (IGSSE), and the National Science Foundation (NSF) under grant numbers CNS-1302563 and CNS-1646383. Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the authors and do not necessarily reflect the views of the NSF and other sponsors.
Publisher Copyright:
© 2017 ACM.
PY - 2017/4/18
Y1 - 2017/4/18
N2 - Due to the growing performance requirements, embedded systems are increasingly more complex. Meanwhile, they are also expected to be reliable. Guaranteeing reliability on complex systems is very challenging. Consequently, there is a substantial need for designs that enable the use of unverified components such as real-time operating system (RTOS) without requiring their correctness to guarantee safety. In this work, we propose a novel approach to design a controller that enables the system to restart and remain safe during and after the restart. Complementing this controller with a switching logic allows the system to use complex, unverified controller to drive the system as long as it does not jeopardize safety. Such a design also tolerates faults that occur in the underlying software layers such as RTOS and middleware and recovers from them through system-level restarts that reinitialize the software (middleware, RTOS, and applications) from a read-only storage. Our approach is implementable using one commercial off-the-shelf (COTS) processing unit. To demonstrate the efficacy of our solution, we fully implement a controller for a 3 degree of freedom (3DOF) helicopter. We test the system by injecting various types of faults into the applications and RTOS and verify that the system remains safe.
AB - Due to the growing performance requirements, embedded systems are increasingly more complex. Meanwhile, they are also expected to be reliable. Guaranteeing reliability on complex systems is very challenging. Consequently, there is a substantial need for designs that enable the use of unverified components such as real-time operating system (RTOS) without requiring their correctness to guarantee safety. In this work, we propose a novel approach to design a controller that enables the system to restart and remain safe during and after the restart. Complementing this controller with a switching logic allows the system to use complex, unverified controller to drive the system as long as it does not jeopardize safety. Such a design also tolerates faults that occur in the underlying software layers such as RTOS and middleware and recovers from them through system-level restarts that reinitialize the software (middleware, RTOS, and applications) from a read-only storage. Our approach is implementable using one commercial off-the-shelf (COTS) processing unit. To demonstrate the efficacy of our solution, we fully implement a controller for a 3 degree of freedom (3DOF) helicopter. We test the system by injecting various types of faults into the applications and RTOS and verify that the system remains safe.
KW - Cyber-physical systems
KW - Embedded systems
KW - Fault-recovery
KW - Fault-tolerance
KW - Reliability
KW - Runtime restart
UR - http://www.scopus.com/inward/record.url?scp=85019047147&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85019047147&partnerID=8YFLogxK
U2 - 10.1145/3055004.3055012
DO - 10.1145/3055004.3055012
M3 - Conference contribution
AN - SCOPUS:85019047147
T3 - Proceedings - 2017 ACM/IEEE 8th International Conference on Cyber-Physical Systems, ICCPS 2017 (part of CPS Week)
SP - 197
EP - 206
BT - Proceedings - 2017 ACM/IEEE 8th International Conference on Cyber-Physical Systems, ICCPS 2017 (part of CPS Week)
PB - Association for Computing Machinery
T2 - 8th ACM/IEEE International Conference on Cyber-Physical Systems, ICCPS 2017
Y2 - 18 April 2017 through 20 April 2017
ER -