TY - GEN
T1 - GPU register file virtualization
AU - Jeon, Hyeran
AU - Ravi, Gokul Subramanian
AU - Kim, Nam Sung
AU - Annavaram, Murali
N1 - Funding Information:
This work was done when Hyeran Jeon was a PhD student in the University of Southern California. This work was supported in part by NSF (0954211 and 1217102) and DARPA (HR0011-12-2-0020). Nam Sung Kim has a financial interest in AMD and Samsung Electronics.
Publisher Copyright:
© 2015 ACM.
PY - 2015/12/5
Y1 - 2015/12/5
N2 - To support massive number of parallel thread contexts, Graphics Processing Units (GPUs) use a huge register file, which is responsible for a large fraction of GPU's total power and area. The conventional belief is that a large register file is inevitable for accommodating more parallel thread contexts, and technology scaling makes it feasible to incorporate ever increasing size of register file. In this paper, we demonstrate that the register file size need not be large to accommodate more threads context. We first characterize the useful lifetime of a register and show that register lifetimes vary drastically across various registers that are allocated to a kernel. While some registers are alive for the entire duration of the kernel execution, some registers have a short lifespan. We propose GPU register file virtualization that allows multiple warps to share physical registers. Since warps may be scheduled for execution at different points in time, we propose to proactively release dead registers from one warp and re-allocate them to a different warp that may occur later in time, thereby reducing the needless demand for physical registers. By using register virtualization, we shrink the architected register space to a smaller physical register space. By under-provisioning the physical register file to be smaller than the architected register file we reduce dynamic and static power consumption. We then develop a new register throttling mechanism to run applications that exceed the size of the under-provisioned register file without any deadlock. Our evaluation shows that even after halving the architected register file size using our proposed GPU register file virtualization applications run successfully with negligible performance overhead.
AB - To support massive number of parallel thread contexts, Graphics Processing Units (GPUs) use a huge register file, which is responsible for a large fraction of GPU's total power and area. The conventional belief is that a large register file is inevitable for accommodating more parallel thread contexts, and technology scaling makes it feasible to incorporate ever increasing size of register file. In this paper, we demonstrate that the register file size need not be large to accommodate more threads context. We first characterize the useful lifetime of a register and show that register lifetimes vary drastically across various registers that are allocated to a kernel. While some registers are alive for the entire duration of the kernel execution, some registers have a short lifespan. We propose GPU register file virtualization that allows multiple warps to share physical registers. Since warps may be scheduled for execution at different points in time, we propose to proactively release dead registers from one warp and re-allocate them to a different warp that may occur later in time, thereby reducing the needless demand for physical registers. By using register virtualization, we shrink the architected register space to a smaller physical register space. By under-provisioning the physical register file to be smaller than the architected register file we reduce dynamic and static power consumption. We then develop a new register throttling mechanism to run applications that exceed the size of the under-provisioned register file without any deadlock. Our evaluation shows that even after halving the architected register file size using our proposed GPU register file virtualization applications run successfully with negligible performance overhead.
KW - GPGPU
KW - energy-efficient computing
KW - microarchitecture
KW - register file
UR - http://www.scopus.com/inward/record.url?scp=84959889805&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84959889805&partnerID=8YFLogxK
U2 - 10.1145/2830772.2830784
DO - 10.1145/2830772.2830784
M3 - Conference contribution
AN - SCOPUS:84959889805
T3 - Proceedings of the Annual International Symposium on Microarchitecture, MICRO
SP - 420
EP - 432
BT - Proceedings - 48th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2015
PB - IEEE Computer Society
T2 - 48th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2015
Y2 - 5 December 2015 through 9 December 2015
ER -