Abstract

Machine Check Architecture (MCA) is a processor internal architecture subsystem that detects and logs correctable and uncorrectable errors in the data or control paths in each CPU core and the Northbridge. These errors include parity errors associated with caches, TLBs, ECC errors associated with caches and DRAM, and system bus errors. This paper reports on an experimental study on: (i) monitoring a computing cluster for machine checks and using this data to identify patterns that can be employed for error diagnostics and (ii) introducing faults into the machine to understand the resulting machine checks and correlate this data with relevant performance metrics.

Original languageEnglish (US)
Title of host publicationProceedings of the 2009 IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2009
Pages578-583
Number of pages6
DOIs
StatePublished - Nov 25 2009
Event2009 IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2009 - Lisbon, Portugal
Duration: Jun 29 2009Jul 2 2009

Publication series

NameProceedings of the International Conference on Dependable Systems and Networks

Other

Other2009 IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2009
CountryPortugal
CityLisbon
Period6/29/097/2/09

ASJC Scopus subject areas

  • Software
  • Hardware and Architecture
  • Computer Networks and Communications

Fingerprint Dive into the research topics of 'Effectiveness of machine checks for error diagnostics'. Together they form a unique fingerprint.

Cite this