TY - GEN
T1 - Precision-aware soft error protection for GPUs
AU - Palframan, David J.
AU - Kim, Nam Sung
AU - Lipasti, Mikko H.
PY - 2014
Y1 - 2014
N2 - With the advent of general-purpose GPU computing, it is becoming increasingly desirable to protect GPUs from soft errors. For high computation throughout, GPUs must store a significant amount of state and have many execution units. The high power and area costs of full protection from soft errors make selective protection techniques attractive. Such approaches provide maximum error coverage within a fixed area or power limit, but typically treat all errors equally. We observe that for many floating-point-intensive GPGPU applications, small magnitude errors may have little effect on results, while large magnitude errors can be amplified to have a significant negative impact. We therefore propose a novel precision-aware protection approach for the GPU execution logic and register file to mitigate large magnitude errors. We also propose an architecture modification to optimize error coverage for integer computations. Our approach combines selective logic hardening, targeted checker circuits, and intelligent register file encoding for best error protection. We demonstrate that our approach can reduce the mean error magnitude by up to 87% compared to a traditional selective protection approach with the same overhead.
AB - With the advent of general-purpose GPU computing, it is becoming increasingly desirable to protect GPUs from soft errors. For high computation throughout, GPUs must store a significant amount of state and have many execution units. The high power and area costs of full protection from soft errors make selective protection techniques attractive. Such approaches provide maximum error coverage within a fixed area or power limit, but typically treat all errors equally. We observe that for many floating-point-intensive GPGPU applications, small magnitude errors may have little effect on results, while large magnitude errors can be amplified to have a significant negative impact. We therefore propose a novel precision-aware protection approach for the GPU execution logic and register file to mitigate large magnitude errors. We also propose an architecture modification to optimize error coverage for integer computations. Our approach combines selective logic hardening, targeted checker circuits, and intelligent register file encoding for best error protection. We demonstrate that our approach can reduce the mean error magnitude by up to 87% compared to a traditional selective protection approach with the same overhead.
UR - http://www.scopus.com/inward/record.url?scp=84904021063&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84904021063&partnerID=8YFLogxK
U2 - 10.1109/HPCA.2014.6835966
DO - 10.1109/HPCA.2014.6835966
M3 - Conference contribution
AN - SCOPUS:84904021063
SN - 9781479930975
T3 - Proceedings - International Symposium on High-Performance Computer Architecture
SP - 49
EP - 59
BT - 20th IEEE International Symposium on High Performance Computer Architecture, HPCA 2014
PB - IEEE Computer Society
T2 - 20th IEEE International Symposium on High Performance Computer Architecture, HPCA 2014
Y2 - 15 February 2014 through 19 February 2014
ER -