TY - GEN
T1 - Analyzing throughput of GPGPUs exploiting within-die core-to-core frequency variation
AU - Lee, Jungseob
AU - Ajgaonkar, Paritosh Pratap
AU - Kim, Nam Sung
PY - 2011/5/30
Y1 - 2011/5/30
N2 - The state-of-the-art general-purpose graphic processing units (GPGPUs) can offer very high computational throughput for general-purpose, highly-parallel applications using hundreds of available on-chip cores. Meanwhile, as technology is scaled down below 65nm, each core's maximum frequency varies significantly due to increasing within-die variations. This, in turn, diminishes the throughput improvement of GPGPUs through technology scaling because the maximum frequency is often limited by the slowest core. In this paper, we investigate two techniques that can mitigate the impact of frequency variations on GPGPU's throughput: 1) running each core at its maximum frequency independently and 2) disabling the slowest cores. Both can maximize GPGPU's frequency at either the individual core or entire processor level. Our experimental results using a GPGPU simulator and a 32nm technology show that the first and second techniques can improve the throughput of compute- and problem-size-bounded applications by up to 32% and 19%, respectively.
AB - The state-of-the-art general-purpose graphic processing units (GPGPUs) can offer very high computational throughput for general-purpose, highly-parallel applications using hundreds of available on-chip cores. Meanwhile, as technology is scaled down below 65nm, each core's maximum frequency varies significantly due to increasing within-die variations. This, in turn, diminishes the throughput improvement of GPGPUs through technology scaling because the maximum frequency is often limited by the slowest core. In this paper, we investigate two techniques that can mitigate the impact of frequency variations on GPGPU's throughput: 1) running each core at its maximum frequency independently and 2) disabling the slowest cores. Both can maximize GPGPU's frequency at either the individual core or entire processor level. Our experimental results using a GPGPU simulator and a 32nm technology show that the first and second techniques can improve the throughput of compute- and problem-size-bounded applications by up to 32% and 19%, respectively.
UR - http://www.scopus.com/inward/record.url?scp=79957530506&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79957530506&partnerID=8YFLogxK
U2 - 10.1109/ISPASS.2011.5762740
DO - 10.1109/ISPASS.2011.5762740
M3 - Conference contribution
AN - SCOPUS:79957530506
SN - 9781612843681
T3 - ISPASS 2011 - IEEE International Symposium on Performance Analysis of Systems and Software
SP - 237
EP - 246
BT - ISPASS 2011 - IEEE International Symposium on Performance Analysis of Systems and Software
T2 - IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS 2011
Y2 - 10 April 2011 through 12 April 2011
ER -