Recovery domains: An organizing principle for recoverable operating systems

Andrew Lenharth, Vikram Adve, Samuel T. King

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We describe a strategy for enabling existing commodity operating systems to recover from unexpected run-time errors in nearly any part of the kernel, including core kernel components. Our approach is dynamic and request-oriented; it isolates the effects of a fault to the requests that caused the fault rather than to static kernel components. This approach is based on a notion of "recovery domains," an organizing principle to enable rollback of state affected by a request in a multithreaded system with minimal impact on other requests or threads. We have applied this approach on v2.4.22 and v2.6.27 of the Linux kernel and it required only 132 lines of changed or new code: the other changes are all performed by a simple instrumentation pass of a compiler. Our experiments show that the approach is able to recover from otherwise fatal faults with minimal collateral impact during a recovery event.

Original languageEnglish (US)
Title of host publicationProceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS-14
PublisherAssociation for Computing Machinery
Pages49-60
Number of pages12
ISBN (Print)9781605584065
DOIs
StatePublished - Jan 1 2009
Event14th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS-14 - Washington, DC, United States
Duration: Mar 7 2009Mar 11 2009

Publication series

NameInternational Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS

Other

Other14th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS-14
CountryUnited States
CityWashington, DC
Period3/7/093/11/09

    Fingerprint

Keywords

  • Akeso
  • Automatic fault recovery
  • Recovery domains

ASJC Scopus subject areas

  • Software
  • Information Systems
  • Hardware and Architecture

Cite this

Lenharth, A., Adve, V., & King, S. T. (2009). Recovery domains: An organizing principle for recoverable operating systems. In Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS-14 (pp. 49-60). (International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS). Association for Computing Machinery. https://doi.org/10.1145/1508244.1508251