TY - GEN
T1 - G-Scalar
T2 - 23rd IEEE Symposium on High Performance Computer Architecture, HPCA 2017
AU - Liu, Zhenhong
AU - Gilani, Syed
AU - Annavaram, Murali
AU - Kim, Nam Sung
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/5/5
Y1 - 2017/5/5
N2 - The GPU has provide higher throughput by integrating more execution resources into a single chip without unduly compromising power efficiency. With the power wall challenge, however, increasing the throughput will require significant improvement in power efficiency. To accomplish this goal, we propose G-Scalar, a cost-effective generalized scalar execution architecture for GPUs in this paper. G-Scalar offers two key advantages over prior architectures supporting scalar execution for only non-divergent arithmetic/logic instructions. First, G-Scalar is more power-efficient as it can also support scalar execution of divergent and special-function instructions, the fraction of which in contemporary GPU applications has notably increased. Second, G-Scalar is less expensive as it can share most of its hardware resources with register value compression, of which adoption has been strongly promoted to reduce high power consumption of accessing the large register file. Compared with the baseline and previous scalar architectures, G-Scalar improves power efficiency by 24% and 15%, respectively, at a negligible cost.
AB - The GPU has provide higher throughput by integrating more execution resources into a single chip without unduly compromising power efficiency. With the power wall challenge, however, increasing the throughput will require significant improvement in power efficiency. To accomplish this goal, we propose G-Scalar, a cost-effective generalized scalar execution architecture for GPUs in this paper. G-Scalar offers two key advantages over prior architectures supporting scalar execution for only non-divergent arithmetic/logic instructions. First, G-Scalar is more power-efficient as it can also support scalar execution of divergent and special-function instructions, the fraction of which in contemporary GPU applications has notably increased. Second, G-Scalar is less expensive as it can share most of its hardware resources with register value compression, of which adoption has been strongly promoted to reduce high power consumption of accessing the large register file. Compared with the baseline and previous scalar architectures, G-Scalar improves power efficiency by 24% and 15%, respectively, at a negligible cost.
KW - GPU
KW - register file
KW - register file compression
KW - scalar execution
UR - http://www.scopus.com/inward/record.url?scp=85019616743&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85019616743&partnerID=8YFLogxK
U2 - 10.1109/HPCA.2017.51
DO - 10.1109/HPCA.2017.51
M3 - Conference contribution
AN - SCOPUS:85019616743
T3 - Proceedings - International Symposium on High-Performance Computer Architecture
SP - 601
EP - 612
BT - Proceedings - 2017 IEEE 23rd Symposium on High Performance Computer Architecture, HPCA 2017
PB - IEEE Computer Society
Y2 - 4 February 2017 through 8 February 2017
ER -