High-performance CUDA kernel execution on FPGAs

Alexandros Papakonstantinou, Karthik Gururaj, John A. Stratton, Deming Chen, Jason Cong, Wen-Mei W Hwu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this work, we propose a new FPGA design flow that combines the CUDA programming model from Nvidia with the state of the art high-level synthesis tool AutoPilot from AutoESL, to efficiently map the exposed parallelism in CUDA kernels onto reconfigurable devices. The use of the CUDA programming model offers the advantage of a common programming interface for exploiting parallelism on two very different types of accelerators - FPGAs and GPUs. Moreover, by leveraging the advanced synthesis capabilities of AutoPilot we enable efficient exploitation of the FPGA configurability for application specific acceleration. Our flow is based on a compilation process that transforms the SPMD CUDA thread blocks into high-concurrency AutoPilot-C code. We provide an overview of our CUDA-to-FPGA flow and demonstrate the highly competitive performance of the generated multi-core accelerators.

Original languageEnglish (US)
Title of host publicationICS'09 - Proceedings of the 23rd International Conference on Supercomputing
Pages515-516
Number of pages2
DOIs
StatePublished - Nov 24 2009
Event23rd International Conference on Supercomputing, ICS'09 - Yorktown Heights, NY, United States
Duration: Jun 8 2009Jun 12 2009

Publication series

NameProceedings of the International Conference on Supercomputing

Other

Other23rd International Conference on Supercomputing, ICS'09
CountryUnited States
CityYorktown Heights, NY
Period6/8/096/12/09

Fingerprint

Field programmable gate arrays (FPGA)
Particle accelerators

Keywords

  • CUDA programming model
  • Coarse-grained parallelism
  • FPGA
  • GPU
  • High performance computing
  • High-level synthesis

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Papakonstantinou, A., Gururaj, K., Stratton, J. A., Chen, D., Cong, J., & Hwu, W-M. W. (2009). High-performance CUDA kernel execution on FPGAs. In ICS'09 - Proceedings of the 23rd International Conference on Supercomputing (pp. 515-516). [1542357] (Proceedings of the International Conference on Supercomputing). https://doi.org/10.1145/1542275.1542357

High-performance CUDA kernel execution on FPGAs. / Papakonstantinou, Alexandros; Gururaj, Karthik; Stratton, John A.; Chen, Deming; Cong, Jason; Hwu, Wen-Mei W.

ICS'09 - Proceedings of the 23rd International Conference on Supercomputing. 2009. p. 515-516 1542357 (Proceedings of the International Conference on Supercomputing).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Papakonstantinou, A, Gururaj, K, Stratton, JA, Chen, D, Cong, J & Hwu, W-MW 2009, High-performance CUDA kernel execution on FPGAs. in ICS'09 - Proceedings of the 23rd International Conference on Supercomputing., 1542357, Proceedings of the International Conference on Supercomputing, pp. 515-516, 23rd International Conference on Supercomputing, ICS'09, Yorktown Heights, NY, United States, 6/8/09. https://doi.org/10.1145/1542275.1542357
Papakonstantinou A, Gururaj K, Stratton JA, Chen D, Cong J, Hwu W-MW. High-performance CUDA kernel execution on FPGAs. In ICS'09 - Proceedings of the 23rd International Conference on Supercomputing. 2009. p. 515-516. 1542357. (Proceedings of the International Conference on Supercomputing). https://doi.org/10.1145/1542275.1542357
Papakonstantinou, Alexandros ; Gururaj, Karthik ; Stratton, John A. ; Chen, Deming ; Cong, Jason ; Hwu, Wen-Mei W. / High-performance CUDA kernel execution on FPGAs. ICS'09 - Proceedings of the 23rd International Conference on Supercomputing. 2009. pp. 515-516 (Proceedings of the International Conference on Supercomputing).
@inproceedings{d77e49b1d55a4f0fac096ce530760d20,
title = "High-performance CUDA kernel execution on FPGAs",
abstract = "In this work, we propose a new FPGA design flow that combines the CUDA programming model from Nvidia with the state of the art high-level synthesis tool AutoPilot from AutoESL, to efficiently map the exposed parallelism in CUDA kernels onto reconfigurable devices. The use of the CUDA programming model offers the advantage of a common programming interface for exploiting parallelism on two very different types of accelerators - FPGAs and GPUs. Moreover, by leveraging the advanced synthesis capabilities of AutoPilot we enable efficient exploitation of the FPGA configurability for application specific acceleration. Our flow is based on a compilation process that transforms the SPMD CUDA thread blocks into high-concurrency AutoPilot-C code. We provide an overview of our CUDA-to-FPGA flow and demonstrate the highly competitive performance of the generated multi-core accelerators.",
keywords = "CUDA programming model, Coarse-grained parallelism, FPGA, GPU, High performance computing, High-level synthesis",
author = "Alexandros Papakonstantinou and Karthik Gururaj and Stratton, {John A.} and Deming Chen and Jason Cong and Hwu, {Wen-Mei W}",
year = "2009",
month = "11",
day = "24",
doi = "10.1145/1542275.1542357",
language = "English (US)",
isbn = "9781605584980",
series = "Proceedings of the International Conference on Supercomputing",
pages = "515--516",
booktitle = "ICS'09 - Proceedings of the 23rd International Conference on Supercomputing",

}

TY - GEN

T1 - High-performance CUDA kernel execution on FPGAs

AU - Papakonstantinou, Alexandros

AU - Gururaj, Karthik

AU - Stratton, John A.

AU - Chen, Deming

AU - Cong, Jason

AU - Hwu, Wen-Mei W

PY - 2009/11/24

Y1 - 2009/11/24

N2 - In this work, we propose a new FPGA design flow that combines the CUDA programming model from Nvidia with the state of the art high-level synthesis tool AutoPilot from AutoESL, to efficiently map the exposed parallelism in CUDA kernels onto reconfigurable devices. The use of the CUDA programming model offers the advantage of a common programming interface for exploiting parallelism on two very different types of accelerators - FPGAs and GPUs. Moreover, by leveraging the advanced synthesis capabilities of AutoPilot we enable efficient exploitation of the FPGA configurability for application specific acceleration. Our flow is based on a compilation process that transforms the SPMD CUDA thread blocks into high-concurrency AutoPilot-C code. We provide an overview of our CUDA-to-FPGA flow and demonstrate the highly competitive performance of the generated multi-core accelerators.

AB - In this work, we propose a new FPGA design flow that combines the CUDA programming model from Nvidia with the state of the art high-level synthesis tool AutoPilot from AutoESL, to efficiently map the exposed parallelism in CUDA kernels onto reconfigurable devices. The use of the CUDA programming model offers the advantage of a common programming interface for exploiting parallelism on two very different types of accelerators - FPGAs and GPUs. Moreover, by leveraging the advanced synthesis capabilities of AutoPilot we enable efficient exploitation of the FPGA configurability for application specific acceleration. Our flow is based on a compilation process that transforms the SPMD CUDA thread blocks into high-concurrency AutoPilot-C code. We provide an overview of our CUDA-to-FPGA flow and demonstrate the highly competitive performance of the generated multi-core accelerators.

KW - CUDA programming model

KW - Coarse-grained parallelism

KW - FPGA

KW - GPU

KW - High performance computing

KW - High-level synthesis

UR - http://www.scopus.com/inward/record.url?scp=70449717519&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=70449717519&partnerID=8YFLogxK

U2 - 10.1145/1542275.1542357

DO - 10.1145/1542275.1542357

M3 - Conference contribution

AN - SCOPUS:70449717519

SN - 9781605584980

T3 - Proceedings of the International Conference on Supercomputing

SP - 515

EP - 516

BT - ICS'09 - Proceedings of the 23rd International Conference on Supercomputing

ER -