TY - GEN
T1 - Memory scheduling towards high-throughput cooperative heterogeneous computing
AU - Wang, Hao
AU - Singh, Ripudaman
AU - Schulte, Michael J.
AU - Kim, Nam Sung
N1 - Copyright:
Copyright 2014 Elsevier B.V., All rights reserved.
PY - 2014
Y1 - 2014
N2 - Technology scaling enables the integration of both the CPU and the GPU into a single chip for higher throughput and energy efficiency. In such a single-chip heterogeneous processor (SCHP), its memory bandwidth is the most critically shared resource, requiring judicious management to maximize the throughput. Previous studies on memory scheduling for SCHPs have focused on the scenario where multiple applications are running on the CPU and the GPU respectively, which we denote as a multi-tasking scenario. However, another increasingly important usage scenario for SCHPs is cooperative heterogeneous computing, where a single parallel application is partitioned between the CPU and the GPU such that the overall throughput is maximized. In previous studies on memory scheduling techniques for chip multi-processors (CMPs) and SCHPs, the first-ready first-come-first-service (FR-FCFS) scheduling policy was used as an inept baseline due to its fairness issue. However, in a cooperative heterogeneous computing scenario, we first demonstrate that FR-FCFS actually offers nearly 10% higher throughput than two recently proposed memory scheduling techniques designed for a multi-tasking scenario. Second, based on our analysis on memory access characteristics in a cooperative heterogeneous computing scenario, we propose various optimization techniques that enhance the row-buffer locality by 10%, reduce the service latency of CPU memory requests by 26%, and improve the overall throughput by up to 8% compared to FR-FCFS.
AB - Technology scaling enables the integration of both the CPU and the GPU into a single chip for higher throughput and energy efficiency. In such a single-chip heterogeneous processor (SCHP), its memory bandwidth is the most critically shared resource, requiring judicious management to maximize the throughput. Previous studies on memory scheduling for SCHPs have focused on the scenario where multiple applications are running on the CPU and the GPU respectively, which we denote as a multi-tasking scenario. However, another increasingly important usage scenario for SCHPs is cooperative heterogeneous computing, where a single parallel application is partitioned between the CPU and the GPU such that the overall throughput is maximized. In previous studies on memory scheduling techniques for chip multi-processors (CMPs) and SCHPs, the first-ready first-come-first-service (FR-FCFS) scheduling policy was used as an inept baseline due to its fairness issue. However, in a cooperative heterogeneous computing scenario, we first demonstrate that FR-FCFS actually offers nearly 10% higher throughput than two recently proposed memory scheduling techniques designed for a multi-tasking scenario. Second, based on our analysis on memory access characteristics in a cooperative heterogeneous computing scenario, we propose various optimization techniques that enhance the row-buffer locality by 10%, reduce the service latency of CPU memory requests by 26%, and improve the overall throughput by up to 8% compared to FR-FCFS.
KW - heterogeneous processor
KW - memory scheduling
UR - http://www.scopus.com/inward/record.url?scp=84907070309&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84907070309&partnerID=8YFLogxK
U2 - 10.1145/2628071.2628096
DO - 10.1145/2628071.2628096
M3 - Conference contribution
AN - SCOPUS:84907070309
SN - 9781450328098
T3 - Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT
SP - 331
EP - 341
BT - PACT 2014 - Proceedings of the 23rd International Conference on Parallel Architectures and Compilation Techniques
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 23rd International Conference on Parallel Architectures and Compilation Techniques, PACT 2014
Y2 - 24 August 2014 through 27 August 2014
ER -