Recovery-driven design: Exploiting error resilience in design of energy-efficient processors

Andrew B. Kahng, Seokhyeong Kang, Rakesh Kumar, John Sartori

Research output: Contribution to journalArticlepeer-review

Abstract

Conventional computer-aided design (CAD) methodologies optimize a processor module for correct operation and prohibit timing violations during nominal operation. We propose recovery-driven design, a design approach that optimizes a processor module for a target timing error rate (ER) instead of correct operation. The target ER is chosen based on how many errors can be gainfully tolerated by a hardware or software error resilience mechanism. We show that significant power benefits are possible from a recovery-driven design approach that deliberately allows errors caused by voltage overscaling to occur during nominal operation, while relying on an error resilience technique to tolerate these errors. We present a detailed evaluation and analysis of such a CAD methodology that minimizes the power of a processor module for a target ER. We show how this design-level methodology can be extended to design recovery-driven processorsprocessors that are optimized to take advantage of hardware or software error resilience. We also discuss a gradual slack recovery-driven design approach that optimizes for a range of ERs to create soft processorsprocessors that have graceful failure characteristics and the ability to trade throughput or output quality for additional energy savings over a range of ERs. We demonstrate significant power benefits over conventional design11.8% on average over all modules and ER targets, and up to 29.1% for individual modules. Processor-level benefits were 19.0%, on average. Benefits increase when recovery-driven design is coupled with an error resilience mechanism or when the number of available voltage domains increases.

Original languageEnglish (US)
Article number6152777
Pages (from-to)404-417
Number of pages14
JournalIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Volume31
Issue number3
DOIs
StatePublished - Mar 2012

Keywords

  • Cell sizing
  • error resilience
  • power minimization
  • recovery-driven design
  • slack redistribution

ASJC Scopus subject areas

  • Software
  • Computer Graphics and Computer-Aided Design
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Recovery-driven design: Exploiting error resilience in design of energy-efficient processors'. Together they form a unique fingerprint.

Cite this