Precision-aware soft error protection for GPUs

David J. Palframan, Nam Sung Kim, Mikko H. Lipasti

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

With the advent of general-purpose GPU computing, it is becoming increasingly desirable to protect GPUs from soft errors. For high computation throughout, GPUs must store a significant amount of state and have many execution units. The high power and area costs of full protection from soft errors make selective protection techniques attractive. Such approaches provide maximum error coverage within a fixed area or power limit, but typically treat all errors equally. We observe that for many floating-point-intensive GPGPU applications, small magnitude errors may have little effect on results, while large magnitude errors can be amplified to have a significant negative impact. We therefore propose a novel precision-aware protection approach for the GPU execution logic and register file to mitigate large magnitude errors. We also propose an architecture modification to optimize error coverage for integer computations. Our approach combines selective logic hardening, targeted checker circuits, and intelligent register file encoding for best error protection. We demonstrate that our approach can reduce the mean error magnitude by up to 87% compared to a traditional selective protection approach with the same overhead.

Original languageEnglish (US)
Title of host publication20th IEEE International Symposium on High Performance Computer Architecture, HPCA 2014
PublisherIEEE Computer Society
Pages49-59
Number of pages11
ISBN (Print)9781479930975
DOIs
StatePublished - 2014
Externally publishedYes
Event20th IEEE International Symposium on High Performance Computer Architecture, HPCA 2014 - Orlando, FL, United States
Duration: Feb 15 2014Feb 19 2014

Publication series

NameProceedings - International Symposium on High-Performance Computer Architecture
ISSN (Print)1530-0897

Other

Other20th IEEE International Symposium on High Performance Computer Architecture, HPCA 2014
Country/TerritoryUnited States
CityOrlando, FL
Period2/15/142/19/14

ASJC Scopus subject areas

  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'Precision-aware soft error protection for GPUs'. Together they form a unique fingerprint.

Cite this