Abstract

This paper describes an experimental study of Linux kernel behavior in the presence of errors that impact the instruction stream of the kernel code. Extensive error injection experiments including over 35,000 errors are conducted targeting the most frequently used functions in the selected kernel subsystems. Three types of faults/errors injection campaigns are conducted: (1) random non-branch instruction, (2) random conditional branch, and (3) valid but incorrect branch. The analysis of the obtained data shows: (i) 95% of the crashes are due to four major causes, namely, unable to handle kernel NULL pointer, unable to handle kernel paging request, invalid opcode, and general protection fault, (ii) less than 10% of the crashes are associated with fault propagation and nearly 40% of crash latencies are within 10 cycles, (iii) errors in the kernel can result in crashes that require reformatting the file system to restore system operation; the process of bringing up the system can take nearly an hour.

Original languageEnglish (US)
Pages459-468
Number of pages10
StatePublished - Dec 1 2003
Event2003 International Conference on Dependable Systems and Networks - San Francisco, CA, United States
Duration: Jun 22 2003Jun 25 2003

Other

Other2003 International Conference on Dependable Systems and Networks
CountryUnited States
CitySan Francisco, CA
Period6/22/036/25/03

Fingerprint

Linux
Experiments

ASJC Scopus subject areas

  • Software
  • Hardware and Architecture
  • Computer Networks and Communications

Cite this

Gu, W., Kalbarczyk, Z., Iyer, R. K., & Yang, Z. (2003). Characterization of Linux Kernel Behavior under Errors. 459-468. Paper presented at 2003 International Conference on Dependable Systems and Networks, San Francisco, CA, United States.

Characterization of Linux Kernel Behavior under Errors. / Gu, Weining; Kalbarczyk, Zbigniew; Iyer, Ravishankar K.; Yang, Zhenyu.

2003. 459-468 Paper presented at 2003 International Conference on Dependable Systems and Networks, San Francisco, CA, United States.

Research output: Contribution to conferencePaper

Gu, W, Kalbarczyk, Z, Iyer, RK & Yang, Z 2003, 'Characterization of Linux Kernel Behavior under Errors', Paper presented at 2003 International Conference on Dependable Systems and Networks, San Francisco, CA, United States, 6/22/03 - 6/25/03 pp. 459-468.
Gu W, Kalbarczyk Z, Iyer RK, Yang Z. Characterization of Linux Kernel Behavior under Errors. 2003. Paper presented at 2003 International Conference on Dependable Systems and Networks, San Francisco, CA, United States.
Gu, Weining ; Kalbarczyk, Zbigniew ; Iyer, Ravishankar K. ; Yang, Zhenyu. / Characterization of Linux Kernel Behavior under Errors. Paper presented at 2003 International Conference on Dependable Systems and Networks, San Francisco, CA, United States.10 p.
@conference{68d8256f3f2848b78e6d3ffe78efd076,
title = "Characterization of Linux Kernel Behavior under Errors",
abstract = "This paper describes an experimental study of Linux kernel behavior in the presence of errors that impact the instruction stream of the kernel code. Extensive error injection experiments including over 35,000 errors are conducted targeting the most frequently used functions in the selected kernel subsystems. Three types of faults/errors injection campaigns are conducted: (1) random non-branch instruction, (2) random conditional branch, and (3) valid but incorrect branch. The analysis of the obtained data shows: (i) 95{\%} of the crashes are due to four major causes, namely, unable to handle kernel NULL pointer, unable to handle kernel paging request, invalid opcode, and general protection fault, (ii) less than 10{\%} of the crashes are associated with fault propagation and nearly 40{\%} of crash latencies are within 10 cycles, (iii) errors in the kernel can result in crashes that require reformatting the file system to restore system operation; the process of bringing up the system can take nearly an hour.",
author = "Weining Gu and Zbigniew Kalbarczyk and Iyer, {Ravishankar K.} and Zhenyu Yang",
year = "2003",
month = "12",
day = "1",
language = "English (US)",
pages = "459--468",
note = "2003 International Conference on Dependable Systems and Networks ; Conference date: 22-06-2003 Through 25-06-2003",

}

TY - CONF

T1 - Characterization of Linux Kernel Behavior under Errors

AU - Gu, Weining

AU - Kalbarczyk, Zbigniew

AU - Iyer, Ravishankar K.

AU - Yang, Zhenyu

PY - 2003/12/1

Y1 - 2003/12/1

N2 - This paper describes an experimental study of Linux kernel behavior in the presence of errors that impact the instruction stream of the kernel code. Extensive error injection experiments including over 35,000 errors are conducted targeting the most frequently used functions in the selected kernel subsystems. Three types of faults/errors injection campaigns are conducted: (1) random non-branch instruction, (2) random conditional branch, and (3) valid but incorrect branch. The analysis of the obtained data shows: (i) 95% of the crashes are due to four major causes, namely, unable to handle kernel NULL pointer, unable to handle kernel paging request, invalid opcode, and general protection fault, (ii) less than 10% of the crashes are associated with fault propagation and nearly 40% of crash latencies are within 10 cycles, (iii) errors in the kernel can result in crashes that require reformatting the file system to restore system operation; the process of bringing up the system can take nearly an hour.

AB - This paper describes an experimental study of Linux kernel behavior in the presence of errors that impact the instruction stream of the kernel code. Extensive error injection experiments including over 35,000 errors are conducted targeting the most frequently used functions in the selected kernel subsystems. Three types of faults/errors injection campaigns are conducted: (1) random non-branch instruction, (2) random conditional branch, and (3) valid but incorrect branch. The analysis of the obtained data shows: (i) 95% of the crashes are due to four major causes, namely, unable to handle kernel NULL pointer, unable to handle kernel paging request, invalid opcode, and general protection fault, (ii) less than 10% of the crashes are associated with fault propagation and nearly 40% of crash latencies are within 10 cycles, (iii) errors in the kernel can result in crashes that require reformatting the file system to restore system operation; the process of bringing up the system can take nearly an hour.

UR - http://www.scopus.com/inward/record.url?scp=1542359963&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=1542359963&partnerID=8YFLogxK

M3 - Paper

AN - SCOPUS:1542359963

SP - 459

EP - 468

ER -