Curios: Improving reliability through operating system structure

Francis M. David, Ellick M. Chan, Jeffrey C. Carlyle, Roy H. Campbell

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

An error that occurs in a microkernel operating system service can potentially result in state corruption and service failure. A simple restart of the failed service is not always the best solution for reliability. Blindly restarting a service which maintains client-related state such as session information results in the loss of this state and affects all clients that were using the service. CuriOS represents a novel OS design that uses lightweight distribution, isolation and persistence of OS service state to mitigate the problem of state loss during a restart. The design also significantly reduces error propagation within client-related state maintained by an OS service. This is achieved by encapsulating services in separate protection domains and granting access to client-related state only when required for request processing. Fault injection experiments show that it is possible to recover from between 87% and 100% of manifested errors in OS services such as the file system, network, timer and scheduler while maintaining low performance overheads.

Original languageEnglish (US)
Title of host publicationProceedings of the 8th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2008
PublisherUSENIX Association
Pages59-72
Number of pages14
ISBN (Electronic)9781931971652
StatePublished - Jan 1 2019
Event8th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2008 - San Diego, United States
Duration: Dec 8 2008Dec 10 2008

Publication series

NameProceedings of the 8th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2008

Conference

Conference8th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2008
CountryUnited States
CitySan Diego
Period12/8/0812/10/08

    Fingerprint

ASJC Scopus subject areas

  • Information Systems
  • Computer Networks and Communications
  • Hardware and Architecture

Cite this

David, F. M., Chan, E. M., Carlyle, J. C., & Campbell, R. H. (2019). Curios: Improving reliability through operating system structure. In Proceedings of the 8th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2008 (pp. 59-72). (Proceedings of the 8th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2008). USENIX Association.