Automatic Parallelization of GPU Applications Using OpenCL

Lizandro D. Solano-Quinde, Brett Bode, Arun K. Somani

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Graphics Processing Units (GPUs) have been successfully used to accelerate scientific applications due to their computation power and the availability of programming languages that make more approachable writing scientific applications for GPUs. However, since the programming model of GPUs requires offloading all the data to the GPU memory, the memory footprint of the application is limited to the size of the GPU memory. Multi-GPU systems can make memory limited problems tractable by parallelizing the computation and data among the available GPUs. Parallelizing applications written for running on single-GPU systems can be done (i) at runtime through an environment that captures the memory operations and kernel calls and distributes among the available GPUs, and (ii) at compile time through a pre-compiler that transforms the application for decomposing the data and computation among the available GPUs. In this paper we propose a framework and implement a tool that transforms an OpenCL application written to run on single-GPU systems into one that runs on multi-GPU systems. Based on data dependencies and data usage analysis, the application is transformed to decompose data and computation among the available GPUs. To reduce the data transfer overhead, computation-communication overlapping techniques are utilized. We tested our tool using two applications with different data transfer requirements, for the application with no data transfer requirements, a linear speedup is achieved, while for the application with data transfers, the computation-communication overlapping reduces the communication overhead by 40%.

Original languageEnglish (US)
Title of host publicationProceedings - 2015 Asia-Pacific Conference on Computer-Aided System Engineering, APCASE 2015
EditorsAlberto Sanchez, Carlos Monsalve, Zenon Chaczko
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages276-283
Number of pages8
ISBN (Electronic)9781479975884
DOIs
StatePublished - Oct 1 2015
EventAsia-Pacific Conference on Computer-Aided System Engineering, APCASE 2015 - Quito, Pichincha, Ecuador
Duration: Jul 14 2015Jul 16 2015

Publication series

NameProceedings - 2015 Asia-Pacific Conference on Computer-Aided System Engineering, APCASE 2015

Other

OtherAsia-Pacific Conference on Computer-Aided System Engineering, APCASE 2015
CountryEcuador
CityQuito, Pichincha
Period7/14/157/16/15

Fingerprint

Data transfer
Data storage equipment
Graphics processing unit
Communication
Computer programming languages
Computer systems
Availability

Keywords

  • GPU
  • OpenCL
  • Program Transformation

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Networks and Communications
  • Computer Science Applications
  • Control and Systems Engineering

Cite this

Solano-Quinde, L. D., Bode, B., & Somani, A. K. (2015). Automatic Parallelization of GPU Applications Using OpenCL. In A. Sanchez, C. Monsalve, & Z. Chaczko (Eds.), Proceedings - 2015 Asia-Pacific Conference on Computer-Aided System Engineering, APCASE 2015 (pp. 276-283). [7287032] (Proceedings - 2015 Asia-Pacific Conference on Computer-Aided System Engineering, APCASE 2015). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/APCASE.2015.56

Automatic Parallelization of GPU Applications Using OpenCL. / Solano-Quinde, Lizandro D.; Bode, Brett; Somani, Arun K.

Proceedings - 2015 Asia-Pacific Conference on Computer-Aided System Engineering, APCASE 2015. ed. / Alberto Sanchez; Carlos Monsalve; Zenon Chaczko. Institute of Electrical and Electronics Engineers Inc., 2015. p. 276-283 7287032 (Proceedings - 2015 Asia-Pacific Conference on Computer-Aided System Engineering, APCASE 2015).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Solano-Quinde, LD, Bode, B & Somani, AK 2015, Automatic Parallelization of GPU Applications Using OpenCL. in A Sanchez, C Monsalve & Z Chaczko (eds), Proceedings - 2015 Asia-Pacific Conference on Computer-Aided System Engineering, APCASE 2015., 7287032, Proceedings - 2015 Asia-Pacific Conference on Computer-Aided System Engineering, APCASE 2015, Institute of Electrical and Electronics Engineers Inc., pp. 276-283, Asia-Pacific Conference on Computer-Aided System Engineering, APCASE 2015, Quito, Pichincha, Ecuador, 7/14/15. https://doi.org/10.1109/APCASE.2015.56
Solano-Quinde LD, Bode B, Somani AK. Automatic Parallelization of GPU Applications Using OpenCL. In Sanchez A, Monsalve C, Chaczko Z, editors, Proceedings - 2015 Asia-Pacific Conference on Computer-Aided System Engineering, APCASE 2015. Institute of Electrical and Electronics Engineers Inc. 2015. p. 276-283. 7287032. (Proceedings - 2015 Asia-Pacific Conference on Computer-Aided System Engineering, APCASE 2015). https://doi.org/10.1109/APCASE.2015.56
Solano-Quinde, Lizandro D. ; Bode, Brett ; Somani, Arun K. / Automatic Parallelization of GPU Applications Using OpenCL. Proceedings - 2015 Asia-Pacific Conference on Computer-Aided System Engineering, APCASE 2015. editor / Alberto Sanchez ; Carlos Monsalve ; Zenon Chaczko. Institute of Electrical and Electronics Engineers Inc., 2015. pp. 276-283 (Proceedings - 2015 Asia-Pacific Conference on Computer-Aided System Engineering, APCASE 2015).
@inproceedings{074df967465c416f9266f10ba18b1ec5,
title = "Automatic Parallelization of GPU Applications Using OpenCL",
abstract = "Graphics Processing Units (GPUs) have been successfully used to accelerate scientific applications due to their computation power and the availability of programming languages that make more approachable writing scientific applications for GPUs. However, since the programming model of GPUs requires offloading all the data to the GPU memory, the memory footprint of the application is limited to the size of the GPU memory. Multi-GPU systems can make memory limited problems tractable by parallelizing the computation and data among the available GPUs. Parallelizing applications written for running on single-GPU systems can be done (i) at runtime through an environment that captures the memory operations and kernel calls and distributes among the available GPUs, and (ii) at compile time through a pre-compiler that transforms the application for decomposing the data and computation among the available GPUs. In this paper we propose a framework and implement a tool that transforms an OpenCL application written to run on single-GPU systems into one that runs on multi-GPU systems. Based on data dependencies and data usage analysis, the application is transformed to decompose data and computation among the available GPUs. To reduce the data transfer overhead, computation-communication overlapping techniques are utilized. We tested our tool using two applications with different data transfer requirements, for the application with no data transfer requirements, a linear speedup is achieved, while for the application with data transfers, the computation-communication overlapping reduces the communication overhead by 40{\%}.",
keywords = "GPU, OpenCL, Program Transformation",
author = "Solano-Quinde, {Lizandro D.} and Brett Bode and Somani, {Arun K.}",
year = "2015",
month = "10",
day = "1",
doi = "10.1109/APCASE.2015.56",
language = "English (US)",
series = "Proceedings - 2015 Asia-Pacific Conference on Computer-Aided System Engineering, APCASE 2015",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "276--283",
editor = "Alberto Sanchez and Carlos Monsalve and Zenon Chaczko",
booktitle = "Proceedings - 2015 Asia-Pacific Conference on Computer-Aided System Engineering, APCASE 2015",
address = "United States",

}

TY - GEN

T1 - Automatic Parallelization of GPU Applications Using OpenCL

AU - Solano-Quinde, Lizandro D.

AU - Bode, Brett

AU - Somani, Arun K.

PY - 2015/10/1

Y1 - 2015/10/1

N2 - Graphics Processing Units (GPUs) have been successfully used to accelerate scientific applications due to their computation power and the availability of programming languages that make more approachable writing scientific applications for GPUs. However, since the programming model of GPUs requires offloading all the data to the GPU memory, the memory footprint of the application is limited to the size of the GPU memory. Multi-GPU systems can make memory limited problems tractable by parallelizing the computation and data among the available GPUs. Parallelizing applications written for running on single-GPU systems can be done (i) at runtime through an environment that captures the memory operations and kernel calls and distributes among the available GPUs, and (ii) at compile time through a pre-compiler that transforms the application for decomposing the data and computation among the available GPUs. In this paper we propose a framework and implement a tool that transforms an OpenCL application written to run on single-GPU systems into one that runs on multi-GPU systems. Based on data dependencies and data usage analysis, the application is transformed to decompose data and computation among the available GPUs. To reduce the data transfer overhead, computation-communication overlapping techniques are utilized. We tested our tool using two applications with different data transfer requirements, for the application with no data transfer requirements, a linear speedup is achieved, while for the application with data transfers, the computation-communication overlapping reduces the communication overhead by 40%.

AB - Graphics Processing Units (GPUs) have been successfully used to accelerate scientific applications due to their computation power and the availability of programming languages that make more approachable writing scientific applications for GPUs. However, since the programming model of GPUs requires offloading all the data to the GPU memory, the memory footprint of the application is limited to the size of the GPU memory. Multi-GPU systems can make memory limited problems tractable by parallelizing the computation and data among the available GPUs. Parallelizing applications written for running on single-GPU systems can be done (i) at runtime through an environment that captures the memory operations and kernel calls and distributes among the available GPUs, and (ii) at compile time through a pre-compiler that transforms the application for decomposing the data and computation among the available GPUs. In this paper we propose a framework and implement a tool that transforms an OpenCL application written to run on single-GPU systems into one that runs on multi-GPU systems. Based on data dependencies and data usage analysis, the application is transformed to decompose data and computation among the available GPUs. To reduce the data transfer overhead, computation-communication overlapping techniques are utilized. We tested our tool using two applications with different data transfer requirements, for the application with no data transfer requirements, a linear speedup is achieved, while for the application with data transfers, the computation-communication overlapping reduces the communication overhead by 40%.

KW - GPU

KW - OpenCL

KW - Program Transformation

UR - http://www.scopus.com/inward/record.url?scp=84959361463&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84959361463&partnerID=8YFLogxK

U2 - 10.1109/APCASE.2015.56

DO - 10.1109/APCASE.2015.56

M3 - Conference contribution

AN - SCOPUS:84959361463

T3 - Proceedings - 2015 Asia-Pacific Conference on Computer-Aided System Engineering, APCASE 2015

SP - 276

EP - 283

BT - Proceedings - 2015 Asia-Pacific Conference on Computer-Aided System Engineering, APCASE 2015

A2 - Sanchez, Alberto

A2 - Monsalve, Carlos

A2 - Chaczko, Zenon

PB - Institute of Electrical and Electronics Engineers Inc.

ER -