TY - GEN
T1 - MCUDA
T2 - 21st International Workshop on Languages and Compilers for Parallel Computing, LCPC 2008
AU - Stratton, John A.
AU - Stone, Sam S.
AU - Hwu, Wen Mei W.
PY - 2008
Y1 - 2008
N2 - CUDA is a data parallel programming model that supports several key abstractions - thread blocks, hierarchical memory and barrier synchronization - for writing applications. This model has proven effective in programming GPUs. In this paper we describe a framework called MCUDA, which allows CUDA programs to be executed efficiently on shared memory, multi-core CPUs. Our framework consists of a set of source-level compiler transformations and a runtime system for parallel execution. Preserving program semantics, the compiler transforms threaded SPMD functions into explicit loops, performs fission to eliminate barrier synchronizations, and converts scalar references to thread-local data to replicated vector references. We describe an implementation of this framework and demonstrate performance approaching that achievable from manually parallelized and optimized C code. With these results, we argue that CUDA can be an effective data-parallel programming model for more than just GPU architectures.
AB - CUDA is a data parallel programming model that supports several key abstractions - thread blocks, hierarchical memory and barrier synchronization - for writing applications. This model has proven effective in programming GPUs. In this paper we describe a framework called MCUDA, which allows CUDA programs to be executed efficiently on shared memory, multi-core CPUs. Our framework consists of a set of source-level compiler transformations and a runtime system for parallel execution. Preserving program semantics, the compiler transforms threaded SPMD functions into explicit loops, performs fission to eliminate barrier synchronizations, and converts scalar references to thread-local data to replicated vector references. We describe an implementation of this framework and demonstrate performance approaching that achievable from manually parallelized and optimized C code. With these results, we argue that CUDA can be an effective data-parallel programming model for more than just GPU architectures.
UR - http://www.scopus.com/inward/record.url?scp=58449109179&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=58449109179&partnerID=8YFLogxK
U2 - 10.1007/978-3-540-89740-8_2
DO - 10.1007/978-3-540-89740-8_2
M3 - Conference contribution
AN - SCOPUS:58449109179
SN - 3540897399
SN - 9783540897392
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 16
EP - 30
BT - Languages and Compilers for Parallel Computing - 21st International Workshop, LCPC 2008, Revised Selected Papers
Y2 - 31 July 2008 through 2 August 2008
ER -