TY - GEN
T1 - Effectiveness of machine checks for error diagnostics
AU - Pandit, Nikhil
AU - Kalbarczyk, Zbigniew
AU - Iyer, Ravishankar K.
PY - 2009
Y1 - 2009
N2 - Machine Check Architecture (MCA) is a processor internal architecture subsystem that detects and logs correctable and uncorrectable errors in the data or control paths in each CPU core and the Northbridge. These errors include parity errors associated with caches, TLBs, ECC errors associated with caches and DRAM, and system bus errors. This paper reports on an experimental study on: (i) monitoring a computing cluster for machine checks and using this data to identify patterns that can be employed for error diagnostics and (ii) introducing faults into the machine to understand the resulting machine checks and correlate this data with relevant performance metrics.
AB - Machine Check Architecture (MCA) is a processor internal architecture subsystem that detects and logs correctable and uncorrectable errors in the data or control paths in each CPU core and the Northbridge. These errors include parity errors associated with caches, TLBs, ECC errors associated with caches and DRAM, and system bus errors. This paper reports on an experimental study on: (i) monitoring a computing cluster for machine checks and using this data to identify patterns that can be employed for error diagnostics and (ii) introducing faults into the machine to understand the resulting machine checks and correlate this data with relevant performance metrics.
UR - http://www.scopus.com/inward/record.url?scp=70449984114&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=70449984114&partnerID=8YFLogxK
U2 - 10.1109/DSN.2009.5270290
DO - 10.1109/DSN.2009.5270290
M3 - Conference contribution
AN - SCOPUS:70449984114
SN - 9781424444212
T3 - Proceedings of the International Conference on Dependable Systems and Networks
SP - 578
EP - 583
BT - Proceedings of the 2009 IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2009
T2 - 2009 IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2009
Y2 - 29 June 2009 through 2 July 2009
ER -