TY - GEN
T1 - Adaptive Cache Management for Energy-Efficient GPU Computing
AU - Chen, Xuhao
AU - Chang, Li Wen
AU - Rodrigues, Christopher I.
AU - Lv, Jie
AU - Wang, Zhiying
AU - Hwu, Wen Mei
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2015/1/15
Y1 - 2015/1/15
N2 - With the SIMT execution model, GPUs can hidememory latency through massive multithreading for many applications that have regular memory access patterns. To support applications with irregular memory access patterns, cache hierarchies have been introduced to GPU architectures to capture temporal and spatial locality and mitigate the effect of irregular accesses. However, GPU caches exhibit poor efficiency due to the mismatch of the throughput-oriented execution model and its cache hierarchy design, which limits system performance and energy-efficiency. The massive amount of memory requests generated by GPU scause cache contention and resource congestion. Existing CPUcache management policies that are designed for multicoresystems, can be suboptimal when directly applied to GPUcaches. We propose a specialized cache management policy for GPGPUs. The cache hierarchy is protected from contention by the bypass policy based on reuse distance. Contention and resource congestion are detected at runtime. To avoid oversaturatingon-chip resources, the bypass policy is coordinated with warp throttling to dynamically control the active number of warps. We also propose a simple predictor to dynamically estimate the optimal number of active warps that can take full advantage of the cache space and on-chip resources. Experimental results show that cache efficiency is significantly improved and on-chip resources are better utilized for cache sensitive benchmarks. This results in a harmonic mean IPCimprovement of 74% and 17% (maximum 661% and 44% IPCimprovement), compared to the baseline GPU architecture and optimal static warp throttling, respectively.
AB - With the SIMT execution model, GPUs can hidememory latency through massive multithreading for many applications that have regular memory access patterns. To support applications with irregular memory access patterns, cache hierarchies have been introduced to GPU architectures to capture temporal and spatial locality and mitigate the effect of irregular accesses. However, GPU caches exhibit poor efficiency due to the mismatch of the throughput-oriented execution model and its cache hierarchy design, which limits system performance and energy-efficiency. The massive amount of memory requests generated by GPU scause cache contention and resource congestion. Existing CPUcache management policies that are designed for multicoresystems, can be suboptimal when directly applied to GPUcaches. We propose a specialized cache management policy for GPGPUs. The cache hierarchy is protected from contention by the bypass policy based on reuse distance. Contention and resource congestion are detected at runtime. To avoid oversaturatingon-chip resources, the bypass policy is coordinated with warp throttling to dynamically control the active number of warps. We also propose a simple predictor to dynamically estimate the optimal number of active warps that can take full advantage of the cache space and on-chip resources. Experimental results show that cache efficiency is significantly improved and on-chip resources are better utilized for cache sensitive benchmarks. This results in a harmonic mean IPCimprovement of 74% and 17% (maximum 661% and 44% IPCimprovement), compared to the baseline GPU architecture and optimal static warp throttling, respectively.
KW - GPGPU
KW - bypass
KW - cache management
KW - warp throttling
UR - http://www.scopus.com/inward/record.url?scp=84937704296&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84937704296&partnerID=8YFLogxK
U2 - 10.1109/MICRO.2014.11
DO - 10.1109/MICRO.2014.11
M3 - Conference contribution
AN - SCOPUS:84937704296
T3 - Proceedings of the Annual International Symposium on Microarchitecture, MICRO
SP - 343
EP - 355
BT - Proceedings - 47th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2014
PB - IEEE Computer Society
T2 - 47th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2014
Y2 - 13 December 2014 through 17 December 2014
ER -