MCUDA: An efficient implementation of CUDA kernels for multi-core CPUs

John A. Stratton, Sam S. Stone, Wen-Mei W Hwu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

CUDA is a data parallel programming model that supports several key abstractions - thread blocks, hierarchical memory and barrier synchronization - for writing applications. This model has proven effective in programming GPUs. In this paper we describe a framework called MCUDA, which allows CUDA programs to be executed efficiently on shared memory, multi-core CPUs. Our framework consists of a set of source-level compiler transformations and a runtime system for parallel execution. Preserving program semantics, the compiler transforms threaded SPMD functions into explicit loops, performs fission to eliminate barrier synchronizations, and converts scalar references to thread-local data to replicated vector references. We describe an implementation of this framework and demonstrate performance approaching that achievable from manually parallelized and optimized C code. With these results, we argue that CUDA can be an effective data-parallel programming model for more than just GPU architectures.

Original languageEnglish (US)
Title of host publicationLanguages and Compilers for Parallel Computing - 21st International Workshop, LCPC 2008, Revised Selected Papers
Pages16-30
Number of pages15
DOIs
StatePublished - Dec 1 2008
Event21st International Workshop on Languages and Compilers for Parallel Computing, LCPC 2008 - Edmonton, AB, Canada
Duration: Jul 31 2008Aug 2 2008

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume5335 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other21st International Workshop on Languages and Compilers for Parallel Computing, LCPC 2008
CountryCanada
CityEdmonton, AB
Period7/31/088/2/08

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint Dive into the research topics of 'MCUDA: An efficient implementation of CUDA kernels for multi-core CPUs'. Together they form a unique fingerprint.

  • Cite this

    Stratton, J. A., Stone, S. S., & Hwu, W-M. W. (2008). MCUDA: An efficient implementation of CUDA kernels for multi-core CPUs. In Languages and Compilers for Parallel Computing - 21st International Workshop, LCPC 2008, Revised Selected Papers (pp. 16-30). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5335 LNCS). https://doi.org/10.1007/978-3-540-89740-8_2