Design evaluation of OpenCL compiler framework for coarse-grained reconfigurable arrays

Hee Seok Kim, Minwook Ahn, John A. Stratton, Wen-Mei W Hwu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

OpenCL is undoubtedly becoming one of the most popular parallel programming languages as it provides a standardized and portable programming model. However, adopting OpenCL for Coarse-Grained Reconfigurable Arrays (CGRA) is challenging due to divergent architecture capability compared to GPUs. In particular, CGRAs are designed to accelerate loop execution by software pipelining on a grid of functional units exploiting instruction-level parallelism. This is vastly different from a GPU in that it executes data parallel kernels using a large number of parallel threads. Therefore, an OpenCL compiler and runtime for CGRAs must map the threaded parallel programming model to a loop-parallel execution model so that the architecture can best utilize its resources. In this paper, we propose and evaluate a design for an OpenCL compiler framework for CGRAs. The proposed design is composed of a serializer and post optimizer. The serializer transforms parallel execution of work-items to an equivalent loop-based iterative execution in order to avoid expensive multithreading on CGRAs. The resulting code is further optimized by the post optimizer to maximize the coverage of software-pipelinable innermost loops. In order to achieve the goal, various loop-level optimizations can take place in the post optimizer using the loops introduced by the serializer for iterative execution of OpenCL kernels. We provide an analysis of the propose framework from a set of well-studied standard OpenCL kernels by comparing performance of various implementations of benchmarks.

Original languageEnglish (US)
Title of host publicationFPT 2012 - 2012 International Conference on Field-Programmable Technology
Pages313-320
Number of pages8
DOIs
StatePublished - Dec 1 2012
Event2012 International Conference on Field-Programmable Technology, FPT 2012 - Seoul, Korea, Republic of
Duration: Dec 10 2012Dec 12 2012

Publication series

NameFPT 2012 - 2012 International Conference on Field-Programmable Technology

Other

Other2012 International Conference on Field-Programmable Technology, FPT 2012
CountryKorea, Republic of
CitySeoul
Period12/10/1212/12/12

Fingerprint

Compiler
Evaluation
Parallel Programming
kernel
Programming Model
Software pipelining
Instruction Level Parallelism
Multithreading
Thread
Accelerate
Programming Languages
Framework
Design
Coverage
Maximise
Transform
Benchmark
Grid
Resources
Unit

Keywords

  • CGRA
  • Coarse-Grained Reconfigurable Arrays
  • GPU
  • OpenCL
  • RP
  • SRP
  • Samsung Reconfigurable Processor

ASJC Scopus subject areas

  • Logic

Cite this

Kim, H. S., Ahn, M., Stratton, J. A., & Hwu, W-M. W. (2012). Design evaluation of OpenCL compiler framework for coarse-grained reconfigurable arrays. In FPT 2012 - 2012 International Conference on Field-Programmable Technology (pp. 313-320). [6412155] (FPT 2012 - 2012 International Conference on Field-Programmable Technology). https://doi.org/10.1109/FPT.2012.6412155

Design evaluation of OpenCL compiler framework for coarse-grained reconfigurable arrays. / Kim, Hee Seok; Ahn, Minwook; Stratton, John A.; Hwu, Wen-Mei W.

FPT 2012 - 2012 International Conference on Field-Programmable Technology. 2012. p. 313-320 6412155 (FPT 2012 - 2012 International Conference on Field-Programmable Technology).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Kim, HS, Ahn, M, Stratton, JA & Hwu, W-MW 2012, Design evaluation of OpenCL compiler framework for coarse-grained reconfigurable arrays. in FPT 2012 - 2012 International Conference on Field-Programmable Technology., 6412155, FPT 2012 - 2012 International Conference on Field-Programmable Technology, pp. 313-320, 2012 International Conference on Field-Programmable Technology, FPT 2012, Seoul, Korea, Republic of, 12/10/12. https://doi.org/10.1109/FPT.2012.6412155
Kim HS, Ahn M, Stratton JA, Hwu W-MW. Design evaluation of OpenCL compiler framework for coarse-grained reconfigurable arrays. In FPT 2012 - 2012 International Conference on Field-Programmable Technology. 2012. p. 313-320. 6412155. (FPT 2012 - 2012 International Conference on Field-Programmable Technology). https://doi.org/10.1109/FPT.2012.6412155
Kim, Hee Seok ; Ahn, Minwook ; Stratton, John A. ; Hwu, Wen-Mei W. / Design evaluation of OpenCL compiler framework for coarse-grained reconfigurable arrays. FPT 2012 - 2012 International Conference on Field-Programmable Technology. 2012. pp. 313-320 (FPT 2012 - 2012 International Conference on Field-Programmable Technology).
@inproceedings{dc60180ea81e4dfeaf02d5603d8288d5,
title = "Design evaluation of OpenCL compiler framework for coarse-grained reconfigurable arrays",
abstract = "OpenCL is undoubtedly becoming one of the most popular parallel programming languages as it provides a standardized and portable programming model. However, adopting OpenCL for Coarse-Grained Reconfigurable Arrays (CGRA) is challenging due to divergent architecture capability compared to GPUs. In particular, CGRAs are designed to accelerate loop execution by software pipelining on a grid of functional units exploiting instruction-level parallelism. This is vastly different from a GPU in that it executes data parallel kernels using a large number of parallel threads. Therefore, an OpenCL compiler and runtime for CGRAs must map the threaded parallel programming model to a loop-parallel execution model so that the architecture can best utilize its resources. In this paper, we propose and evaluate a design for an OpenCL compiler framework for CGRAs. The proposed design is composed of a serializer and post optimizer. The serializer transforms parallel execution of work-items to an equivalent loop-based iterative execution in order to avoid expensive multithreading on CGRAs. The resulting code is further optimized by the post optimizer to maximize the coverage of software-pipelinable innermost loops. In order to achieve the goal, various loop-level optimizations can take place in the post optimizer using the loops introduced by the serializer for iterative execution of OpenCL kernels. We provide an analysis of the propose framework from a set of well-studied standard OpenCL kernels by comparing performance of various implementations of benchmarks.",
keywords = "CGRA, Coarse-Grained Reconfigurable Arrays, GPU, OpenCL, RP, SRP, Samsung Reconfigurable Processor",
author = "Kim, {Hee Seok} and Minwook Ahn and Stratton, {John A.} and Hwu, {Wen-Mei W}",
year = "2012",
month = "12",
day = "1",
doi = "10.1109/FPT.2012.6412155",
language = "English (US)",
isbn = "9781467328449",
series = "FPT 2012 - 2012 International Conference on Field-Programmable Technology",
pages = "313--320",
booktitle = "FPT 2012 - 2012 International Conference on Field-Programmable Technology",

}

TY - GEN

T1 - Design evaluation of OpenCL compiler framework for coarse-grained reconfigurable arrays

AU - Kim, Hee Seok

AU - Ahn, Minwook

AU - Stratton, John A.

AU - Hwu, Wen-Mei W

PY - 2012/12/1

Y1 - 2012/12/1

N2 - OpenCL is undoubtedly becoming one of the most popular parallel programming languages as it provides a standardized and portable programming model. However, adopting OpenCL for Coarse-Grained Reconfigurable Arrays (CGRA) is challenging due to divergent architecture capability compared to GPUs. In particular, CGRAs are designed to accelerate loop execution by software pipelining on a grid of functional units exploiting instruction-level parallelism. This is vastly different from a GPU in that it executes data parallel kernels using a large number of parallel threads. Therefore, an OpenCL compiler and runtime for CGRAs must map the threaded parallel programming model to a loop-parallel execution model so that the architecture can best utilize its resources. In this paper, we propose and evaluate a design for an OpenCL compiler framework for CGRAs. The proposed design is composed of a serializer and post optimizer. The serializer transforms parallel execution of work-items to an equivalent loop-based iterative execution in order to avoid expensive multithreading on CGRAs. The resulting code is further optimized by the post optimizer to maximize the coverage of software-pipelinable innermost loops. In order to achieve the goal, various loop-level optimizations can take place in the post optimizer using the loops introduced by the serializer for iterative execution of OpenCL kernels. We provide an analysis of the propose framework from a set of well-studied standard OpenCL kernels by comparing performance of various implementations of benchmarks.

AB - OpenCL is undoubtedly becoming one of the most popular parallel programming languages as it provides a standardized and portable programming model. However, adopting OpenCL for Coarse-Grained Reconfigurable Arrays (CGRA) is challenging due to divergent architecture capability compared to GPUs. In particular, CGRAs are designed to accelerate loop execution by software pipelining on a grid of functional units exploiting instruction-level parallelism. This is vastly different from a GPU in that it executes data parallel kernels using a large number of parallel threads. Therefore, an OpenCL compiler and runtime for CGRAs must map the threaded parallel programming model to a loop-parallel execution model so that the architecture can best utilize its resources. In this paper, we propose and evaluate a design for an OpenCL compiler framework for CGRAs. The proposed design is composed of a serializer and post optimizer. The serializer transforms parallel execution of work-items to an equivalent loop-based iterative execution in order to avoid expensive multithreading on CGRAs. The resulting code is further optimized by the post optimizer to maximize the coverage of software-pipelinable innermost loops. In order to achieve the goal, various loop-level optimizations can take place in the post optimizer using the loops introduced by the serializer for iterative execution of OpenCL kernels. We provide an analysis of the propose framework from a set of well-studied standard OpenCL kernels by comparing performance of various implementations of benchmarks.

KW - CGRA

KW - Coarse-Grained Reconfigurable Arrays

KW - GPU

KW - OpenCL

KW - RP

KW - SRP

KW - Samsung Reconfigurable Processor

UR - http://www.scopus.com/inward/record.url?scp=84874038195&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84874038195&partnerID=8YFLogxK

U2 - 10.1109/FPT.2012.6412155

DO - 10.1109/FPT.2012.6412155

M3 - Conference contribution

AN - SCOPUS:84874038195

SN - 9781467328449

T3 - FPT 2012 - 2012 International Conference on Field-Programmable Technology

SP - 313

EP - 320

BT - FPT 2012 - 2012 International Conference on Field-Programmable Technology

ER -