TY - GEN
T1 - Automating CUDA Synchronization via Program Transformation
AU - Wu, Mingyuan
AU - Zhang, Lingming
AU - Liu, Cong
AU - Tan, Shin Hwei
AU - Zhang, Yuqun
N1 - Funding Information:
This work is partially supported by the National Natural Science Foundation of China (Grant No. 61902169 and No. 61902170), Shenzhen Peacock Plan (Grant No. KQTD2016112514355531), and Science and Technology Innovation Committee Foundation of Shenzhen (Grant No. ZDSYS201703031748284 and No. JCYJ20170817110848086). This work is also partially supported by National Science Foundation under Grant No. CCF-1763906 and Amazon. The authors also thank Yicheng Ouyang for the help with editing the paper.
Publisher Copyright:
© 2019 IEEE.
PY - 2019/11
Y1 - 2019/11
N2 - While CUDA has been the most popular parallel computing platform and programming model for general purpose GPU computing, CUDA synchronization undergoes significant challenges for GPU programmers due to its intricate parallel computing mechanism and coding practices. In this paper, we propose AuCS, the first general framework to automate synchronization for CUDA kernel functions. AuCS transforms the original LLVM-level CUDA program control flow graph in a semantic-preserving manner for exploring the possible barrier function locations. Accordingly, AuCS develops mechanisms to correctly place barrier functions for automating synchronization in multiple erroneous (challenging-to-be-detected) synchronization scenarios, including data race, barrier divergence, and redundant barrier functions. To evaluate the effectiveness and efficiency of AuCS, we conduct an extensive set of experiments and the results demonstrate that AuCS can automate 20 out of 24 erroneous synchronization scenarios.
AB - While CUDA has been the most popular parallel computing platform and programming model for general purpose GPU computing, CUDA synchronization undergoes significant challenges for GPU programmers due to its intricate parallel computing mechanism and coding practices. In this paper, we propose AuCS, the first general framework to automate synchronization for CUDA kernel functions. AuCS transforms the original LLVM-level CUDA program control flow graph in a semantic-preserving manner for exploring the possible barrier function locations. Accordingly, AuCS develops mechanisms to correctly place barrier functions for automating synchronization in multiple erroneous (challenging-to-be-detected) synchronization scenarios, including data race, barrier divergence, and redundant barrier functions. To evaluate the effectiveness and efficiency of AuCS, we conduct an extensive set of experiments and the results demonstrate that AuCS can automate 20 out of 24 erroneous synchronization scenarios.
KW - CUDA
KW - Program repair
KW - Program transformation
KW - Synchronization automation
UR - http://www.scopus.com/inward/record.url?scp=85078956332&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85078956332&partnerID=8YFLogxK
U2 - 10.1109/ASE.2019.00075
DO - 10.1109/ASE.2019.00075
M3 - Conference contribution
AN - SCOPUS:85078956332
T3 - Proceedings - 2019 34th IEEE/ACM International Conference on Automated Software Engineering, ASE 2019
SP - 748
EP - 759
BT - Proceedings - 2019 34th IEEE/ACM International Conference on Automated Software Engineering, ASE 2019
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 34th IEEE/ACM International Conference on Automated Software Engineering, ASE 2019
Y2 - 10 November 2019 through 15 November 2019
ER -