TY - GEN
T1 - Building a self-healing operating system
AU - David, Francis M.
AU - Campbell, Roy H.
PY - 2007
Y1 - 2007
N2 - User applications and data in volatile memory are usually lost when an operating system crashes because of errors caused by either hardware or software faults. This is because most operating systems are designed to stop working when some internal errors are detected despite the possibility that user data and applications might still be intact and recoverable. Techniques like exception handling, code reloading, operating system component isolation, micro-rebooting, automatic system service restarts, watchdog timer based recovery and transactional components can be applied to attempt self-healing of an operating system from a wide variety of errors. Fault injection experiments show that these techniques can be used to continue running user applications after transparently recovering the operating system in a large percentage of cases. In cases where transparent recovery is not possible, individual process recovery can be attempted as a last resort.
AB - User applications and data in volatile memory are usually lost when an operating system crashes because of errors caused by either hardware or software faults. This is because most operating systems are designed to stop working when some internal errors are detected despite the possibility that user data and applications might still be intact and recoverable. Techniques like exception handling, code reloading, operating system component isolation, micro-rebooting, automatic system service restarts, watchdog timer based recovery and transactional components can be applied to attempt self-healing of an operating system from a wide variety of errors. Fault injection experiments show that these techniques can be used to continue running user applications after transparently recovering the operating system in a large percentage of cases. In cases where transparent recovery is not possible, individual process recovery can be attempted as a last resort.
UR - http://www.scopus.com/inward/record.url?scp=38049006933&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=38049006933&partnerID=8YFLogxK
U2 - 10.1109/ISDASC.2007.4351383
DO - 10.1109/ISDASC.2007.4351383
M3 - Conference contribution
AN - SCOPUS:38049006933
SN - 0769529852
SN - 9780769529851
T3 - Proceedings - DASC 2007: Third IEEE International Symposium on Dependable, Autonomic and Secure Computing
SP - 3
EP - 10
BT - Proceedings - DASC 2007
T2 - DASC 2007: Third IEEE International Symposium on Dependable, Autonomic and Secure Computing
Y2 - 25 September 2007 through 26 September 2007
ER -