Adaptive Cache Management for Energy-Efficient GPU Computing

Xuhao Chen, Li Wen Chang, Christopher I. Rodrigues, Jie Lv, Zhiying Wang, Wen-Mei W Hwu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

With the SIMT execution model, GPUs can hidememory latency through massive multithreading for many applications that have regular memory access patterns. To support applications with irregular memory access patterns, cache hierarchies have been introduced to GPU architectures to capture temporal and spatial locality and mitigate the effect of irregular accesses. However, GPU caches exhibit poor efficiency due to the mismatch of the throughput-oriented execution model and its cache hierarchy design, which limits system performance and energy-efficiency. The massive amount of memory requests generated by GPU scause cache contention and resource congestion. Existing CPUcache management policies that are designed for multicoresystems, can be suboptimal when directly applied to GPUcaches. We propose a specialized cache management policy for GPGPUs. The cache hierarchy is protected from contention by the bypass policy based on reuse distance. Contention and resource congestion are detected at runtime. To avoid oversaturatingon-chip resources, the bypass policy is coordinated with warp throttling to dynamically control the active number of warps. We also propose a simple predictor to dynamically estimate the optimal number of active warps that can take full advantage of the cache space and on-chip resources. Experimental results show that cache efficiency is significantly improved and on-chip resources are better utilized for cache sensitive benchmarks. This results in a harmonic mean IPCimprovement of 74% and 17% (maximum 661% and 44% IPCimprovement), compared to the baseline GPU architecture and optimal static warp throttling, respectively.

Original languageEnglish (US)
Title of host publicationProceedings - 47th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2014
PublisherIEEE Computer Society
Pages343-355
Number of pages13
EditionJanuary
ISBN (Electronic)9781479969982
DOIs
StatePublished - Jan 15 2015
Event47th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2014 - Cambridge, United Kingdom
Duration: Dec 13 2014Dec 17 2014

Publication series

NameProceedings of the Annual International Symposium on Microarchitecture, MICRO
NumberJanuary
Volume2015-January
ISSN (Print)1072-4451

Other

Other47th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2014
CountryUnited Kingdom
CityCambridge
Period12/13/1412/17/14

Fingerprint

Memory architecture
Multitasking
Computer architecture
Program processors
Energy efficiency
Data storage equipment
Throughput
Graphics processing unit

Keywords

  • bypass
  • cache management
  • GPGPU
  • warp throttling

ASJC Scopus subject areas

  • Hardware and Architecture

Cite this

Chen, X., Chang, L. W., Rodrigues, C. I., Lv, J., Wang, Z., & Hwu, W-M. W. (2015). Adaptive Cache Management for Energy-Efficient GPU Computing. In Proceedings - 47th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2014 (January ed., pp. 343-355). [7011400] (Proceedings of the Annual International Symposium on Microarchitecture, MICRO; Vol. 2015-January, No. January). IEEE Computer Society. https://doi.org/10.1109/MICRO.2014.11

Adaptive Cache Management for Energy-Efficient GPU Computing. / Chen, Xuhao; Chang, Li Wen; Rodrigues, Christopher I.; Lv, Jie; Wang, Zhiying; Hwu, Wen-Mei W.

Proceedings - 47th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2014. January. ed. IEEE Computer Society, 2015. p. 343-355 7011400 (Proceedings of the Annual International Symposium on Microarchitecture, MICRO; Vol. 2015-January, No. January).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Chen, X, Chang, LW, Rodrigues, CI, Lv, J, Wang, Z & Hwu, W-MW 2015, Adaptive Cache Management for Energy-Efficient GPU Computing. in Proceedings - 47th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2014. January edn, 7011400, Proceedings of the Annual International Symposium on Microarchitecture, MICRO, no. January, vol. 2015-January, IEEE Computer Society, pp. 343-355, 47th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2014, Cambridge, United Kingdom, 12/13/14. https://doi.org/10.1109/MICRO.2014.11
Chen X, Chang LW, Rodrigues CI, Lv J, Wang Z, Hwu W-MW. Adaptive Cache Management for Energy-Efficient GPU Computing. In Proceedings - 47th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2014. January ed. IEEE Computer Society. 2015. p. 343-355. 7011400. (Proceedings of the Annual International Symposium on Microarchitecture, MICRO; January). https://doi.org/10.1109/MICRO.2014.11
Chen, Xuhao ; Chang, Li Wen ; Rodrigues, Christopher I. ; Lv, Jie ; Wang, Zhiying ; Hwu, Wen-Mei W. / Adaptive Cache Management for Energy-Efficient GPU Computing. Proceedings - 47th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2014. January. ed. IEEE Computer Society, 2015. pp. 343-355 (Proceedings of the Annual International Symposium on Microarchitecture, MICRO; January).
@inproceedings{3326b21d60e342e8ae8a27620638e0d6,
title = "Adaptive Cache Management for Energy-Efficient GPU Computing",
abstract = "With the SIMT execution model, GPUs can hidememory latency through massive multithreading for many applications that have regular memory access patterns. To support applications with irregular memory access patterns, cache hierarchies have been introduced to GPU architectures to capture temporal and spatial locality and mitigate the effect of irregular accesses. However, GPU caches exhibit poor efficiency due to the mismatch of the throughput-oriented execution model and its cache hierarchy design, which limits system performance and energy-efficiency. The massive amount of memory requests generated by GPU scause cache contention and resource congestion. Existing CPUcache management policies that are designed for multicoresystems, can be suboptimal when directly applied to GPUcaches. We propose a specialized cache management policy for GPGPUs. The cache hierarchy is protected from contention by the bypass policy based on reuse distance. Contention and resource congestion are detected at runtime. To avoid oversaturatingon-chip resources, the bypass policy is coordinated with warp throttling to dynamically control the active number of warps. We also propose a simple predictor to dynamically estimate the optimal number of active warps that can take full advantage of the cache space and on-chip resources. Experimental results show that cache efficiency is significantly improved and on-chip resources are better utilized for cache sensitive benchmarks. This results in a harmonic mean IPCimprovement of 74{\%} and 17{\%} (maximum 661{\%} and 44{\%} IPCimprovement), compared to the baseline GPU architecture and optimal static warp throttling, respectively.",
keywords = "bypass, cache management, GPGPU, warp throttling",
author = "Xuhao Chen and Chang, {Li Wen} and Rodrigues, {Christopher I.} and Jie Lv and Zhiying Wang and Hwu, {Wen-Mei W}",
year = "2015",
month = "1",
day = "15",
doi = "10.1109/MICRO.2014.11",
language = "English (US)",
series = "Proceedings of the Annual International Symposium on Microarchitecture, MICRO",
publisher = "IEEE Computer Society",
number = "January",
pages = "343--355",
booktitle = "Proceedings - 47th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2014",
edition = "January",

}

TY - GEN

T1 - Adaptive Cache Management for Energy-Efficient GPU Computing

AU - Chen, Xuhao

AU - Chang, Li Wen

AU - Rodrigues, Christopher I.

AU - Lv, Jie

AU - Wang, Zhiying

AU - Hwu, Wen-Mei W

PY - 2015/1/15

Y1 - 2015/1/15

N2 - With the SIMT execution model, GPUs can hidememory latency through massive multithreading for many applications that have regular memory access patterns. To support applications with irregular memory access patterns, cache hierarchies have been introduced to GPU architectures to capture temporal and spatial locality and mitigate the effect of irregular accesses. However, GPU caches exhibit poor efficiency due to the mismatch of the throughput-oriented execution model and its cache hierarchy design, which limits system performance and energy-efficiency. The massive amount of memory requests generated by GPU scause cache contention and resource congestion. Existing CPUcache management policies that are designed for multicoresystems, can be suboptimal when directly applied to GPUcaches. We propose a specialized cache management policy for GPGPUs. The cache hierarchy is protected from contention by the bypass policy based on reuse distance. Contention and resource congestion are detected at runtime. To avoid oversaturatingon-chip resources, the bypass policy is coordinated with warp throttling to dynamically control the active number of warps. We also propose a simple predictor to dynamically estimate the optimal number of active warps that can take full advantage of the cache space and on-chip resources. Experimental results show that cache efficiency is significantly improved and on-chip resources are better utilized for cache sensitive benchmarks. This results in a harmonic mean IPCimprovement of 74% and 17% (maximum 661% and 44% IPCimprovement), compared to the baseline GPU architecture and optimal static warp throttling, respectively.

AB - With the SIMT execution model, GPUs can hidememory latency through massive multithreading for many applications that have regular memory access patterns. To support applications with irregular memory access patterns, cache hierarchies have been introduced to GPU architectures to capture temporal and spatial locality and mitigate the effect of irregular accesses. However, GPU caches exhibit poor efficiency due to the mismatch of the throughput-oriented execution model and its cache hierarchy design, which limits system performance and energy-efficiency. The massive amount of memory requests generated by GPU scause cache contention and resource congestion. Existing CPUcache management policies that are designed for multicoresystems, can be suboptimal when directly applied to GPUcaches. We propose a specialized cache management policy for GPGPUs. The cache hierarchy is protected from contention by the bypass policy based on reuse distance. Contention and resource congestion are detected at runtime. To avoid oversaturatingon-chip resources, the bypass policy is coordinated with warp throttling to dynamically control the active number of warps. We also propose a simple predictor to dynamically estimate the optimal number of active warps that can take full advantage of the cache space and on-chip resources. Experimental results show that cache efficiency is significantly improved and on-chip resources are better utilized for cache sensitive benchmarks. This results in a harmonic mean IPCimprovement of 74% and 17% (maximum 661% and 44% IPCimprovement), compared to the baseline GPU architecture and optimal static warp throttling, respectively.

KW - bypass

KW - cache management

KW - GPGPU

KW - warp throttling

UR - http://www.scopus.com/inward/record.url?scp=84937704296&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84937704296&partnerID=8YFLogxK

U2 - 10.1109/MICRO.2014.11

DO - 10.1109/MICRO.2014.11

M3 - Conference contribution

AN - SCOPUS:84937704296

T3 - Proceedings of the Annual International Symposium on Microarchitecture, MICRO

SP - 343

EP - 355

BT - Proceedings - 47th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2014

PB - IEEE Computer Society

ER -