CUDA-Lite: Reducing GPU programming complexity

Sain Zee Ueng, Melvin Lathara, Sara S. Baghsorkhi, Wen Mei W. Hwu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The computer industry has transitioned into multi-core and many-core parallel systems. The CUDA programming environment from NVIDIA is an attempt to make programming many-core GPUs more accessible to programmers. However, there are still many burdens placed upon the programmer to maximize performance when using CUDA. One such burden is dealing with the complex memory hierarchy. Efficient and correct usage of the various memories is essential, making a difference of 2-17x in performance. Currently, the task of determining the appropriate memory to use and the coding of data transfer between memories is still left to the programmer. We believe that this task can be better performed by automated tools. We present CUDA-lite, an enhancement to CUDA, as one such tool. We leverage programmer knowledge via annotations to perform transformations and show preliminary results that indicate auto-generated code can have performance comparable to hand coding.

Original languageEnglish (US)
Title of host publicationLanguages and Compilers for Parallel Computing - 21st International Workshop, LCPC 2008, Revised Selected Papers
Pages1-15
Number of pages15
DOIs
StatePublished - Dec 1 2008
Event21st International Workshop on Languages and Compilers for Parallel Computing, LCPC 2008 - Edmonton, AB, Canada
Duration: Jul 31 2008Aug 2 2008

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume5335 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other21st International Workshop on Languages and Compilers for Parallel Computing, LCPC 2008
CountryCanada
CityEdmonton, AB
Period7/31/088/2/08

Fingerprint

Computer programming
Programming
Many-core
Data storage equipment
Coding
Memory Hierarchy
Programming Environments
Data Transfer
Parallel Systems
Leverage
Annotation
Data transfer
Enhancement
Maximise
Industry
Graphics processing unit

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Ueng, S. Z., Lathara, M., Baghsorkhi, S. S., & Hwu, W. M. W. (2008). CUDA-Lite: Reducing GPU programming complexity. In Languages and Compilers for Parallel Computing - 21st International Workshop, LCPC 2008, Revised Selected Papers (pp. 1-15). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5335 LNCS). https://doi.org/10.1007/978-3-540-89740-8_1

CUDA-Lite : Reducing GPU programming complexity. / Ueng, Sain Zee; Lathara, Melvin; Baghsorkhi, Sara S.; Hwu, Wen Mei W.

Languages and Compilers for Parallel Computing - 21st International Workshop, LCPC 2008, Revised Selected Papers. 2008. p. 1-15 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5335 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Ueng, SZ, Lathara, M, Baghsorkhi, SS & Hwu, WMW 2008, CUDA-Lite: Reducing GPU programming complexity. in Languages and Compilers for Parallel Computing - 21st International Workshop, LCPC 2008, Revised Selected Papers. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 5335 LNCS, pp. 1-15, 21st International Workshop on Languages and Compilers for Parallel Computing, LCPC 2008, Edmonton, AB, Canada, 7/31/08. https://doi.org/10.1007/978-3-540-89740-8_1
Ueng SZ, Lathara M, Baghsorkhi SS, Hwu WMW. CUDA-Lite: Reducing GPU programming complexity. In Languages and Compilers for Parallel Computing - 21st International Workshop, LCPC 2008, Revised Selected Papers. 2008. p. 1-15. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-540-89740-8_1
Ueng, Sain Zee ; Lathara, Melvin ; Baghsorkhi, Sara S. ; Hwu, Wen Mei W. / CUDA-Lite : Reducing GPU programming complexity. Languages and Compilers for Parallel Computing - 21st International Workshop, LCPC 2008, Revised Selected Papers. 2008. pp. 1-15 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{d3d9289df7ab4d2185b0d7015662af6f,
title = "CUDA-Lite: Reducing GPU programming complexity",
abstract = "The computer industry has transitioned into multi-core and many-core parallel systems. The CUDA programming environment from NVIDIA is an attempt to make programming many-core GPUs more accessible to programmers. However, there are still many burdens placed upon the programmer to maximize performance when using CUDA. One such burden is dealing with the complex memory hierarchy. Efficient and correct usage of the various memories is essential, making a difference of 2-17x in performance. Currently, the task of determining the appropriate memory to use and the coding of data transfer between memories is still left to the programmer. We believe that this task can be better performed by automated tools. We present CUDA-lite, an enhancement to CUDA, as one such tool. We leverage programmer knowledge via annotations to perform transformations and show preliminary results that indicate auto-generated code can have performance comparable to hand coding.",
author = "Ueng, {Sain Zee} and Melvin Lathara and Baghsorkhi, {Sara S.} and Hwu, {Wen Mei W.}",
year = "2008",
month = "12",
day = "1",
doi = "10.1007/978-3-540-89740-8_1",
language = "English (US)",
isbn = "3540897399",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "1--15",
booktitle = "Languages and Compilers for Parallel Computing - 21st International Workshop, LCPC 2008, Revised Selected Papers",

}

TY - GEN

T1 - CUDA-Lite

T2 - Reducing GPU programming complexity

AU - Ueng, Sain Zee

AU - Lathara, Melvin

AU - Baghsorkhi, Sara S.

AU - Hwu, Wen Mei W.

PY - 2008/12/1

Y1 - 2008/12/1

N2 - The computer industry has transitioned into multi-core and many-core parallel systems. The CUDA programming environment from NVIDIA is an attempt to make programming many-core GPUs more accessible to programmers. However, there are still many burdens placed upon the programmer to maximize performance when using CUDA. One such burden is dealing with the complex memory hierarchy. Efficient and correct usage of the various memories is essential, making a difference of 2-17x in performance. Currently, the task of determining the appropriate memory to use and the coding of data transfer between memories is still left to the programmer. We believe that this task can be better performed by automated tools. We present CUDA-lite, an enhancement to CUDA, as one such tool. We leverage programmer knowledge via annotations to perform transformations and show preliminary results that indicate auto-generated code can have performance comparable to hand coding.

AB - The computer industry has transitioned into multi-core and many-core parallel systems. The CUDA programming environment from NVIDIA is an attempt to make programming many-core GPUs more accessible to programmers. However, there are still many burdens placed upon the programmer to maximize performance when using CUDA. One such burden is dealing with the complex memory hierarchy. Efficient and correct usage of the various memories is essential, making a difference of 2-17x in performance. Currently, the task of determining the appropriate memory to use and the coding of data transfer between memories is still left to the programmer. We believe that this task can be better performed by automated tools. We present CUDA-lite, an enhancement to CUDA, as one such tool. We leverage programmer knowledge via annotations to perform transformations and show preliminary results that indicate auto-generated code can have performance comparable to hand coding.

UR - http://www.scopus.com/inward/record.url?scp=58449127539&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=58449127539&partnerID=8YFLogxK

U2 - 10.1007/978-3-540-89740-8_1

DO - 10.1007/978-3-540-89740-8_1

M3 - Conference contribution

AN - SCOPUS:58449127539

SN - 3540897399

SN - 9783540897392

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 1

EP - 15

BT - Languages and Compilers for Parallel Computing - 21st International Workshop, LCPC 2008, Revised Selected Papers

ER -