Abstract

This paper describes the Reliability MicroKernel (RMK) framework, a loadable kernel module (or a device driver) for providing application-aware reliability, and dynamically configuring reliability mechanisms. Characteristics of application/system execution are exploited transparently through application-aware reliability techniques to achieve low-latency detection, and low-overhead checkpointing. The RMK prototype is implemented in both Linux, and Windows; and it supports detection of application/OS failures, and transparent application checkpointing. Experiment results show that the system hang detection and application hang detection, which exploit characteristics of application, and system behavior, can achieve high coverage (100% observed in our experiments) with a low false positive rate. Moreover, the performance overhead of RMK, and its detection/checkpointing mechanisms, is small: 0.6% for application hang detection, and 0.1% for transparent application checkpointing in the experiments.

Original languageEnglish (US)
Pages (from-to)597-614
Number of pages18
JournalIEEE Transactions on Reliability
Volume56
Issue number4
DOIs
StatePublished - Dec 2007

Keywords

  • Application aware reliability
  • Checkpointing
  • Error detection
  • OS-level error detection
  • System crash/hang detection
  • Transparent application checkpointing

ASJC Scopus subject areas

  • Safety, Risk, Reliability and Quality
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Reliability MicroKernel: Providing application-aware reliability in the OS'. Together they form a unique fingerprint.

Cite this