Exploiting GPU peak-power and performance tradeoffs through reduced effective pipeline latency

Syed Zohaib Gilani, Nam Sung Kim, Michael J. Schulte

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Modern GPUs share limited hardware resources, such as register files, among a large number of concurrently executing threads. For efficient resource sharing, several buffering and collision avoidance stages are inserted in the GPU pipeline. These additional stages increase the read-after-write (RAW) latencies of instructions. Since GPUs are often architected to hide RAW latencies through extensive multithreading, they typically do not employ power-hungry data-forwarding networks (DFNs). However, we observe that many GPGPU applications do not have enough active threads that are ready to issue instructions to hide these RAW latencies. In this paper, we first demonstrate that DFNs can considerably improve the performance of many compute-intensive GPGPU applications and then propose most recent result forwarding (MoRF) as a low-power alternative to the DFN. Second, for floating-point (FP) operations, we exploit a high-throughput fused multiply-add (HFMA) unit to further reduce both RAW latencies and the number of FMA units in the GPU without impacting instruction throughput. MoRF and HFMA together provide a geometric mean performance improvement of 18% and 29% for integer/single-precision and double-precision GPGPU applications, respectively. Finally, both MoRF and HFMA allow the GPU to effectively mimic a shallower pipeline for a large percentage of instructions. Exploiting such a benefit, we propose low-power pipelines that can reduce peak power consumption by 14% without affecting the performance or increasing the complexity of the forwarding network. The peak power reduction allows GPUs to operate more cores within the same power budget, achieving a geometric mean performance improvement of 33% for double-precision GPGPU applications.

Original languageEnglish (US)
Title of host publicationMICRO 2013 - Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Pages74-85
Number of pages12
DOIs
StatePublished - Dec 1 2013
Externally publishedYes
Event46th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2013 - Davis, CA, United States
Duration: Dec 7 2013Dec 11 2013

Publication series

NameMICRO 2013 - Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture

Other

Other46th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2013
CountryUnited States
CityDavis, CA
Period12/7/1312/11/13

Keywords

  • GPUs
  • low-power
  • pipeline latencies

ASJC Scopus subject areas

  • Hardware and Architecture
  • Electrical and Electronic Engineering

Fingerprint Dive into the research topics of 'Exploiting GPU peak-power and performance tradeoffs through reduced effective pipeline latency'. Together they form a unique fingerprint.

Cite this