TY - GEN
T1 - Improving throughput of power-constrained GPUs using dynamic voltage/frequency and core scaling
AU - Lee, Jungseob
AU - Sathisha, Vijay
AU - Schulte, Michael
AU - Compton, Katherine
AU - Kim, Nam Sung
PY - 2011
Y1 - 2011
N2 - State-of-the-art graphic processing units (GPUs) can offer very high computational throughput for highly parallel applications using hundreds of integrated cores. In general, the peak throughput of a GPU is proportional to the product of the number of cores and their frequency. However, the product is often limited by a power constraint. Although the throughput can be increased with more cores for some applications, it cannot for others because parallelism of applications and/or bandwidth of on-chip interconnects/caches and off-chip memory are limited. In this paper, first, we demonstrate that adjusting the number of operating cores and the voltage/frequency of cores and/or on-chip interconnects/caches for different applications can improve the throughput of GPUs under a power constraint. Second, we show that dynamically scaling the number of operating cores and the voltages/frequencies of both cores and on-chip interconnects/caches at runtime can improve the throughput of application even further. Our experimental results show that a GPU adopting our runtime dynamic voltage/frequency and core scaling technique can provide up to 38% (and nearly 20% on average) higher throughput than the baseline GPU under the same power constraint.
AB - State-of-the-art graphic processing units (GPUs) can offer very high computational throughput for highly parallel applications using hundreds of integrated cores. In general, the peak throughput of a GPU is proportional to the product of the number of cores and their frequency. However, the product is often limited by a power constraint. Although the throughput can be increased with more cores for some applications, it cannot for others because parallelism of applications and/or bandwidth of on-chip interconnects/caches and off-chip memory are limited. In this paper, first, we demonstrate that adjusting the number of operating cores and the voltage/frequency of cores and/or on-chip interconnects/caches for different applications can improve the throughput of GPUs under a power constraint. Second, we show that dynamically scaling the number of operating cores and the voltages/frequencies of both cores and on-chip interconnects/caches at runtime can improve the throughput of application even further. Our experimental results show that a GPU adopting our runtime dynamic voltage/frequency and core scaling technique can provide up to 38% (and nearly 20% on average) higher throughput than the baseline GPU under the same power constraint.
KW - Dynamic voltage, frequency, and core scaling
KW - GPU
KW - Power constraint
KW - Throughput
UR - http://www.scopus.com/inward/record.url?scp=84863037228&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84863037228&partnerID=8YFLogxK
U2 - 10.1109/PACT.2011.17
DO - 10.1109/PACT.2011.17
M3 - Conference contribution
AN - SCOPUS:84863037228
SN - 9780769545660
T3 - Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT
SP - 111
EP - 120
BT - Proceedings - 2011 International Conference on Parallel Architectures and Compilation Techniques, PACT 2011
T2 - 20th International Conference on Parallel Architectures and Compilation Techniques, PACT 2011
Y2 - 10 October 2011 through 14 October 2011
ER -