The case for lifetime reliability-aware microprocessors

Jayanth Srinivasan, Sarita V. Adve, Pradip Bose, Jude A. Rivers

Research output: Contribution to journalConference articlepeer-review


Ensuring long processor lifetimes by limiting failures due to wear-out related hard errors is a critical requirement for all microprocessor manufacturers. We observe that continuous device scaling and increasing temperatures are making lifetime reliability targets even harder to meet. However, current methodologies for qualifying lifetime reliability are overly conservative since they assume worstcase operating conditions. This paper makes the case that the continued use of such methodologies will significantly and unnecessarily constrain performance. Instead, lifetime reliability awareness at the microarchitectural design stage can mitigate this problem, by designing processors that dynamically adapt in response to the observed usage to meet a reliability target. We make two specific contributions. First, we describe an architecture-level model and its implementation, called RAMP, that can dynamically track lifetime reliability, responding to changes in application behavior. RAMP is based on state-of-the-art device models for different wear-out mechanisms. Second, we propose dynamic reliability management (DRM) - a technique where the processor can respond to changing application behavior to maintain its lifetime reliability target. In contrast to current worst-case behavior based reliability qualification methodologies, DRM allows processors to be qualified for reliability at lower (but more likely) operating points than the worst case. Using RAMP, we show that this can save cost and/or improve performance, that dynamic voltage scaling is an effective response technique for DRM, and that dynamic thermal management neither subsumes nor is subsumed by DRM.

Original languageEnglish (US)
Pages (from-to)276-287
Number of pages12
JournalConference Proceedings - Annual International Symposium on Computer Architecture, ISCA
StatePublished - Oct 8 2004
EventProceedings -31st Annual International Symposium on Computer Architecture - Munich, Germany
Duration: Jun 19 2004Jun 23 2004

ASJC Scopus subject areas

  • Hardware and Architecture


Dive into the research topics of 'The case for lifetime reliability-aware microprocessors'. Together they form a unique fingerprint.

Cite this