TY - GEN
T1 - GSI
T2 - 17th International Symposium on Performance Analysis of Systems and Software, ISPASS 2016
AU - Alsop, Johnathan
AU - Sinclair, Matthew D.
AU - Komuravelli, Rakesh
AU - Adve, Sarita V.
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2016/5/31
Y1 - 2016/5/31
N2 - In recent years the power wall has prevented the continued scaling of single core performance. This has lead to the rise of dark silicon and motivated a move toward parallelism and specialization. As a result, energy-efficient high-throughput GPU cores are increasingly favored for accelerating data-parallel applications. However, the best way to efficiently communicate and synchronize across heterogeneous cores remains an important open research question. Many methods have been proposed to improve the efficiency of heterogeneous memory systems, but current methods for evaluating the performance effects of these innovations are limited in their ability to attribute differences in execution time to sources of latency in the memory system. Performance characterization of tightly coupled CPU-GPU systems is complicated by the high levels of parallelism present in GPU codes. Existing simulation tools provide only coarse-grained metrics which can obscure the underlying memory system interactions that cause performance differences. In this work we introduce GPU Stall Inspector (GSI), a method for identifying and visualizing the causes of GPU stalls with a focus on a tightly coupled CPU-GPU memory subsystem. We demonstrate the utility of our approach by evaluating the sources of stalls in several recent architectural innovations for tightly coupled, heterogeneous CPU-GPU systems.
AB - In recent years the power wall has prevented the continued scaling of single core performance. This has lead to the rise of dark silicon and motivated a move toward parallelism and specialization. As a result, energy-efficient high-throughput GPU cores are increasingly favored for accelerating data-parallel applications. However, the best way to efficiently communicate and synchronize across heterogeneous cores remains an important open research question. Many methods have been proposed to improve the efficiency of heterogeneous memory systems, but current methods for evaluating the performance effects of these innovations are limited in their ability to attribute differences in execution time to sources of latency in the memory system. Performance characterization of tightly coupled CPU-GPU systems is complicated by the high levels of parallelism present in GPU codes. Existing simulation tools provide only coarse-grained metrics which can obscure the underlying memory system interactions that cause performance differences. In this work we introduce GPU Stall Inspector (GSI), a method for identifying and visualizing the causes of GPU stalls with a focus on a tightly coupled CPU-GPU memory subsystem. We demonstrate the utility of our approach by evaluating the sources of stalls in several recent architectural innovations for tightly coupled, heterogeneous CPU-GPU systems.
UR - http://www.scopus.com/inward/record.url?scp=84978715176&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84978715176&partnerID=8YFLogxK
U2 - 10.1109/ISPASS.2016.7482092
DO - 10.1109/ISPASS.2016.7482092
M3 - Conference contribution
AN - SCOPUS:84978715176
T3 - ISPASS 2016 - International Symposium on Performance Analysis of Systems and Software
SP - 172
EP - 182
BT - ISPASS 2016 - International Symposium on Performance Analysis of Systems and Software
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 17 April 2016 through 19 April 2016
ER -