High-performance CUDA kernel execution on FPGAs

Alexandros Papakonstantinou, Karthik Gururaj, John A. Stratton, Deming Chen, Jason Cong, Wen-Mei W Hwu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this work, we propose a new FPGA design flow that combines the CUDA programming model from Nvidia with the state of the art high-level synthesis tool AutoPilot from AutoESL, to efficiently map the exposed parallelism in CUDA kernels onto reconfigurable devices. The use of the CUDA programming model offers the advantage of a common programming interface for exploiting parallelism on two very different types of accelerators - FPGAs and GPUs. Moreover, by leveraging the advanced synthesis capabilities of AutoPilot we enable efficient exploitation of the FPGA configurability for application specific acceleration. Our flow is based on a compilation process that transforms the SPMD CUDA thread blocks into high-concurrency AutoPilot-C code. We provide an overview of our CUDA-to-FPGA flow and demonstrate the highly competitive performance of the generated multi-core accelerators.

Original languageEnglish (US)
Title of host publicationICS'09 - Proceedings of the 23rd International Conference on Supercomputing
Pages515-516
Number of pages2
DOIs
StatePublished - Nov 24 2009
Event23rd International Conference on Supercomputing, ICS'09 - Yorktown Heights, NY, United States
Duration: Jun 8 2009Jun 12 2009

Publication series

NameProceedings of the International Conference on Supercomputing

Other

Other23rd International Conference on Supercomputing, ICS'09
CountryUnited States
CityYorktown Heights, NY
Period6/8/096/12/09

Keywords

  • CUDA programming model
  • Coarse-grained parallelism
  • FPGA
  • GPU
  • High performance computing
  • High-level synthesis

ASJC Scopus subject areas

  • Computer Science(all)

Fingerprint Dive into the research topics of 'High-performance CUDA kernel execution on FPGAs'. Together they form a unique fingerprint.

  • Cite this

    Papakonstantinou, A., Gururaj, K., Stratton, J. A., Chen, D., Cong, J., & Hwu, W-M. W. (2009). High-performance CUDA kernel execution on FPGAs. In ICS'09 - Proceedings of the 23rd International Conference on Supercomputing (pp. 515-516). [1542357] (Proceedings of the International Conference on Supercomputing). https://doi.org/10.1145/1542275.1542357