An Interpretable Predictive Model for Early Detection of Hardware Failure

Artsiom Balakir, Alan Yang, Elyse Rosenbaum

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper develops an accurate yet interpretable machine learning framework for predicting field failures from time-series diagnostic data with application to datacenter hard disk drive failure prediction. Interpretable models are accountable: model reasoning can be verified by a domain expert for critical reliability tasks. We develop an attention-augmented recurrent neural network that visualizes the temporal information used to generate predictions; visualizations correlate with physical expectations. Finally, we propose a clustering-based method for discovering failure modes.

Original languageEnglish (US)
Title of host publication2020 IEEE International Reliability Physics Symposium, IRPS 2020 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781728131993
DOIs
StatePublished - Apr 2020
Event2020 IEEE International Reliability Physics Symposium, IRPS 2020 - Virtual, Online, United States
Duration: Apr 28 2020May 30 2020

Publication series

NameIEEE International Reliability Physics Symposium Proceedings
Volume2020-April
ISSN (Print)1541-7026

Conference

Conference2020 IEEE International Reliability Physics Symposium, IRPS 2020
CountryUnited States
CityVirtual, Online
Period4/28/205/30/20

Keywords

  • Failure Prediction
  • Hard Disk Drives
  • Interpretable Prediction
  • Machine Learning
  • System Reliability

ASJC Scopus subject areas

  • Engineering(all)

Fingerprint Dive into the research topics of 'An Interpretable Predictive Model for Early Detection of Hardware Failure'. Together they form a unique fingerprint.

Cite this