Efficient compilation of fine-grained SPMD-threaded programs for multicore CPUs

John A. Stratton, Vinod Grover, Jaydeep Marathe, Bastiaan Aarts, Mike Murphy, Ziang Hu, Wen-Mei W Hwu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper we describe techniques for compiling fine-grained SPMD-threaded programs, expressed in programming models such as OpenCL or CUDA, to multicore execution platforms. Programs developed for manycore processors typically express finer thread-level parallelism than is appropriate for multicore platforms. We describe options for implementing fine-grained threading in software, and find that reasonable restrictions on the synchronization model enable significant optimizations and performance improvements over a baseline approach. We evaluate these techniques in a production-level compiler and runtime for the CUDA programming model targeting modern CPUs. Applications tested with our tool often showed performance parity with the compiled C version of the application for single-thread performance. With modest coarse-grained multithreading typical of today's CPU architectures, an average of 3.4x speedup on 4 processors was observed across the test applications.

Original languageEnglish (US)
Title of host publicationProceedings of the 2010 CGO - The 8th International Symposium on Code Generation and Optimization
Pages111-119
Number of pages9
DOIs
StatePublished - Jul 1 2010
Event8th International Symposium on Code Generation and Optimization, CGO 2010 - Toronto, ON, Canada
Duration: Apr 24 2010Apr 28 2010

Other

Other8th International Symposium on Code Generation and Optimization, CGO 2010
CountryCanada
CityToronto, ON
Period4/24/104/28/10

Fingerprint

Compilation
Program processors
Thread
Programming Model
Multithreading
Many-core
Computer programming
Compiler
Parity
Parallelism
Baseline
Synchronization
Speedup
Express
Restriction
Software
Optimization
Evaluate
Model

Keywords

  • CPU
  • CUDA
  • multicore
  • SPMD

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Theoretical Computer Science

Cite this

Stratton, J. A., Grover, V., Marathe, J., Aarts, B., Murphy, M., Hu, Z., & Hwu, W-M. W. (2010). Efficient compilation of fine-grained SPMD-threaded programs for multicore CPUs. In Proceedings of the 2010 CGO - The 8th International Symposium on Code Generation and Optimization (pp. 111-119) https://doi.org/10.1145/1772954.1772971

Efficient compilation of fine-grained SPMD-threaded programs for multicore CPUs. / Stratton, John A.; Grover, Vinod; Marathe, Jaydeep; Aarts, Bastiaan; Murphy, Mike; Hu, Ziang; Hwu, Wen-Mei W.

Proceedings of the 2010 CGO - The 8th International Symposium on Code Generation and Optimization. 2010. p. 111-119.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Stratton, JA, Grover, V, Marathe, J, Aarts, B, Murphy, M, Hu, Z & Hwu, W-MW 2010, Efficient compilation of fine-grained SPMD-threaded programs for multicore CPUs. in Proceedings of the 2010 CGO - The 8th International Symposium on Code Generation and Optimization. pp. 111-119, 8th International Symposium on Code Generation and Optimization, CGO 2010, Toronto, ON, Canada, 4/24/10. https://doi.org/10.1145/1772954.1772971
Stratton JA, Grover V, Marathe J, Aarts B, Murphy M, Hu Z et al. Efficient compilation of fine-grained SPMD-threaded programs for multicore CPUs. In Proceedings of the 2010 CGO - The 8th International Symposium on Code Generation and Optimization. 2010. p. 111-119 https://doi.org/10.1145/1772954.1772971
Stratton, John A. ; Grover, Vinod ; Marathe, Jaydeep ; Aarts, Bastiaan ; Murphy, Mike ; Hu, Ziang ; Hwu, Wen-Mei W. / Efficient compilation of fine-grained SPMD-threaded programs for multicore CPUs. Proceedings of the 2010 CGO - The 8th International Symposium on Code Generation and Optimization. 2010. pp. 111-119
@inproceedings{88d676052c294b3a88c1078f26fe10fb,
title = "Efficient compilation of fine-grained SPMD-threaded programs for multicore CPUs",
abstract = "In this paper we describe techniques for compiling fine-grained SPMD-threaded programs, expressed in programming models such as OpenCL or CUDA, to multicore execution platforms. Programs developed for manycore processors typically express finer thread-level parallelism than is appropriate for multicore platforms. We describe options for implementing fine-grained threading in software, and find that reasonable restrictions on the synchronization model enable significant optimizations and performance improvements over a baseline approach. We evaluate these techniques in a production-level compiler and runtime for the CUDA programming model targeting modern CPUs. Applications tested with our tool often showed performance parity with the compiled C version of the application for single-thread performance. With modest coarse-grained multithreading typical of today's CPU architectures, an average of 3.4x speedup on 4 processors was observed across the test applications.",
keywords = "CPU, CUDA, multicore, SPMD",
author = "Stratton, {John A.} and Vinod Grover and Jaydeep Marathe and Bastiaan Aarts and Mike Murphy and Ziang Hu and Hwu, {Wen-Mei W}",
year = "2010",
month = "7",
day = "1",
doi = "10.1145/1772954.1772971",
language = "English (US)",
isbn = "9781605586359",
pages = "111--119",
booktitle = "Proceedings of the 2010 CGO - The 8th International Symposium on Code Generation and Optimization",

}

TY - GEN

T1 - Efficient compilation of fine-grained SPMD-threaded programs for multicore CPUs

AU - Stratton, John A.

AU - Grover, Vinod

AU - Marathe, Jaydeep

AU - Aarts, Bastiaan

AU - Murphy, Mike

AU - Hu, Ziang

AU - Hwu, Wen-Mei W

PY - 2010/7/1

Y1 - 2010/7/1

N2 - In this paper we describe techniques for compiling fine-grained SPMD-threaded programs, expressed in programming models such as OpenCL or CUDA, to multicore execution platforms. Programs developed for manycore processors typically express finer thread-level parallelism than is appropriate for multicore platforms. We describe options for implementing fine-grained threading in software, and find that reasonable restrictions on the synchronization model enable significant optimizations and performance improvements over a baseline approach. We evaluate these techniques in a production-level compiler and runtime for the CUDA programming model targeting modern CPUs. Applications tested with our tool often showed performance parity with the compiled C version of the application for single-thread performance. With modest coarse-grained multithreading typical of today's CPU architectures, an average of 3.4x speedup on 4 processors was observed across the test applications.

AB - In this paper we describe techniques for compiling fine-grained SPMD-threaded programs, expressed in programming models such as OpenCL or CUDA, to multicore execution platforms. Programs developed for manycore processors typically express finer thread-level parallelism than is appropriate for multicore platforms. We describe options for implementing fine-grained threading in software, and find that reasonable restrictions on the synchronization model enable significant optimizations and performance improvements over a baseline approach. We evaluate these techniques in a production-level compiler and runtime for the CUDA programming model targeting modern CPUs. Applications tested with our tool often showed performance parity with the compiled C version of the application for single-thread performance. With modest coarse-grained multithreading typical of today's CPU architectures, an average of 3.4x speedup on 4 processors was observed across the test applications.

KW - CPU

KW - CUDA

KW - multicore

KW - SPMD

UR - http://www.scopus.com/inward/record.url?scp=77953978573&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77953978573&partnerID=8YFLogxK

U2 - 10.1145/1772954.1772971

DO - 10.1145/1772954.1772971

M3 - Conference contribution

AN - SCOPUS:77953978573

SN - 9781605586359

SP - 111

EP - 119

BT - Proceedings of the 2010 CGO - The 8th International Symposium on Code Generation and Optimization

ER -