TY - GEN
T1 - A hierarchical approach for dependability analysis of a commercial cache-based RAID storage architecture
AU - Kaâniche, M.
AU - Romano, L.
AU - Kalbarczyk, Z.
AU - Iyer, R.
AU - Karcich, R.
N1 - Funding Information:
The authors are grateful to the anonymous reviewers whose comments helped improve the presentation of the paper and to Fran Baker for her insightful editing if our manuscript. This work was supported by the National Aeronautics and Space Administration (NASA) under grant NAG-1-613, in cooperation with the Illinois Computer Laboratory for Aerospace Systems and Software (ICLASS), and by the Advanced Research Projects Agency under grant DABT63-94-C-0045. The findings, opinions, and recommendations expressed herein are those of the authors and do not necessarily reflect the position or policy of the United States Government or the University of Illinois, and no official endorsement should be inferred.
PY - 1998
Y1 - 1998
N2 - We present a hierarchical simulation approach for the dependability analysis and evaluation of a highly available commercial cache-based RAID storage system. The architecture is complex and includes several layers of overlapping error detection and recovery mechanisms. Three abstraction levels have been developed to model the cache architecture, cache operations, and error detection and recovery mechanism. The impact of faults and errors occurring in the cache and in the disks is analyzed at each level of the hierarchy. A simulation submodel is associated with each abstraction level. The models have been developed using DEPEND, a simulation-based environment for system-level dependability analysis, which provides facilities to inject faults into a functional behavior model, to simulate error detection and recovery mechanisms, and to evaluate quantitative measures. Several fault models are defined for each submodel to simulate cache component failures, disk failures, transmission errors, and data errors in the cache memory and in the disks. Some of the parameters characterizing fault injection in a given submodel correspond to probabilities evaluated from the simulation of the lower-level submodel. Based on the proposed methodology, we evaluate and analyze 1) the system behavior under a real workload and high error rate (focusing on error bursts), 2) the coverage of the error detection mechanisms implemented in the system and the error latency distributions, and 3) the accumulation of errors in the cache and in the disks.
AB - We present a hierarchical simulation approach for the dependability analysis and evaluation of a highly available commercial cache-based RAID storage system. The architecture is complex and includes several layers of overlapping error detection and recovery mechanisms. Three abstraction levels have been developed to model the cache architecture, cache operations, and error detection and recovery mechanism. The impact of faults and errors occurring in the cache and in the disks is analyzed at each level of the hierarchy. A simulation submodel is associated with each abstraction level. The models have been developed using DEPEND, a simulation-based environment for system-level dependability analysis, which provides facilities to inject faults into a functional behavior model, to simulate error detection and recovery mechanisms, and to evaluate quantitative measures. Several fault models are defined for each submodel to simulate cache component failures, disk failures, transmission errors, and data errors in the cache memory and in the disks. Some of the parameters characterizing fault injection in a given submodel correspond to probabilities evaluated from the simulation of the lower-level submodel. Based on the proposed methodology, we evaluate and analyze 1) the system behavior under a real workload and high error rate (focusing on error bursts), 2) the coverage of the error detection mechanisms implemented in the system and the error latency distributions, and 3) the accumulation of errors in the cache and in the disks.
UR - http://www.scopus.com/inward/record.url?scp=27544457277&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=27544457277&partnerID=8YFLogxK
U2 - 10.1109/FTCS.1998.689450
DO - 10.1109/FTCS.1998.689450
M3 - Conference contribution
AN - SCOPUS:27544457277
T3 - Digest of Papers - 28th Annual International Symposium on Fault-Tolerant Computing, FTCS 1998
SP - 6
EP - 15
BT - Digest of Papers - 28th Annual International Symposium on Fault-Tolerant Computing, FTCS 1998
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 28th Annual International Symposium on Fault-Tolerant Computing, FTCS 1998
Y2 - 23 June 1998 through 25 June 1998
ER -