Exploring Recovery from Operating System Lockups

Francis M. David, Jeffrey C. Carlyle, Roy H. Campbell

Research output: Contribution to conferencePaperpeer-review


Operating system lockup errors can render a computer unusable by preventing the execution other programs. Watchdog timers can be used to recover from a lockup by resetting the processor and rebooting the system when a lockup is detected. This results in a loss of unsaved data in running programs. Based on the observation that volatile memory is not affected when a processor a reset occurs, we present an approach to recover from a watchdog reset with minimal or zero loss of application state. We study the resolution of lockup conditions using thread termination and using exception dispatch. Thread termination can still result in a usable system and is already used as a recovery strategy for other errors in Linux. Using exceptions allows developers to write code to handle a lockup within the erroneous thread and attempt application transparent recovery. Fault injection experiments show that a significant percentage of lockups can be recovered by thread termination. Exception handling further improves the recoverability of the operating system.

Original languageEnglish (US)
Number of pages6
StatePublished - 2007
Event2007 USENIX Annual Technical Conference, USENIX 2007 - Santa Clara, United States
Duration: Jun 17 2007Jun 22 2007


Conference2007 USENIX Annual Technical Conference, USENIX 2007
Country/TerritoryUnited States
CitySanta Clara

ASJC Scopus subject areas

  • General Computer Science


Dive into the research topics of 'Exploring Recovery from Operating System Lockups'. Together they form a unique fingerprint.

Cite this