Building a self-healing operating system

Francis M. David, Roy H. Campbell

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

User applications and data in volatile memory are usually lost when an operating system crashes because of errors caused by either hardware or software faults. This is because most operating systems are designed to stop working when some internal errors are detected despite the possibility that user data and applications might still be intact and recoverable. Techniques like exception handling, code reloading, operating system component isolation, micro-rebooting, automatic system service restarts, watchdog timer based recovery and transactional components can be applied to attempt self-healing of an operating system from a wide variety of errors. Fault injection experiments show that these techniques can be used to continue running user applications after transparently recovering the operating system in a large percentage of cases. In cases where transparent recovery is not possible, individual process recovery can be attempted as a last resort.

Original languageEnglish (US)
Title of host publicationProceedings - DASC 2007
Subtitle of host publicationThird IEEE International Symposium on Dependable, Autonomic and Secure Computing
Pages3-10
Number of pages8
DOIs
StatePublished - 2007
EventDASC 2007: Third IEEE International Symposium on Dependable, Autonomic and Secure Computing - Columbia, MD, United States
Duration: Sep 25 2007Sep 26 2007

Publication series

NameProceedings - DASC 2007: Third IEEE International Symposium on Dependable, Autonomic and Secure Computing

Other

OtherDASC 2007: Third IEEE International Symposium on Dependable, Autonomic and Secure Computing
Country/TerritoryUnited States
CityColumbia, MD
Period9/25/079/26/07

ASJC Scopus subject areas

  • General Computer Science
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Building a self-healing operating system'. Together they form a unique fingerprint.

Cite this