TY - GEN
T1 - High-performance CUDA kernel execution on FPGAs
AU - Papakonstantinou, Alexandros
AU - Gururaj, Karthik
AU - Stratton, John A.
AU - Chen, Deming
AU - Cong, Jason
AU - Hwu, Wen Mei W.
PY - 2009
Y1 - 2009
N2 - In this work, we propose a new FPGA design flow that combines the CUDA programming model from Nvidia with the state of the art high-level synthesis tool AutoPilot from AutoESL, to efficiently map the exposed parallelism in CUDA kernels onto reconfigurable devices. The use of the CUDA programming model offers the advantage of a common programming interface for exploiting parallelism on two very different types of accelerators - FPGAs and GPUs. Moreover, by leveraging the advanced synthesis capabilities of AutoPilot we enable efficient exploitation of the FPGA configurability for application specific acceleration. Our flow is based on a compilation process that transforms the SPMD CUDA thread blocks into high-concurrency AutoPilot-C code. We provide an overview of our CUDA-to-FPGA flow and demonstrate the highly competitive performance of the generated multi-core accelerators.
AB - In this work, we propose a new FPGA design flow that combines the CUDA programming model from Nvidia with the state of the art high-level synthesis tool AutoPilot from AutoESL, to efficiently map the exposed parallelism in CUDA kernels onto reconfigurable devices. The use of the CUDA programming model offers the advantage of a common programming interface for exploiting parallelism on two very different types of accelerators - FPGAs and GPUs. Moreover, by leveraging the advanced synthesis capabilities of AutoPilot we enable efficient exploitation of the FPGA configurability for application specific acceleration. Our flow is based on a compilation process that transforms the SPMD CUDA thread blocks into high-concurrency AutoPilot-C code. We provide an overview of our CUDA-to-FPGA flow and demonstrate the highly competitive performance of the generated multi-core accelerators.
KW - CUDA programming model
KW - Coarse-grained parallelism
KW - FPGA
KW - GPU
KW - High performance computing
KW - High-level synthesis
UR - http://www.scopus.com/inward/record.url?scp=70449717519&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=70449717519&partnerID=8YFLogxK
U2 - 10.1145/1542275.1542357
DO - 10.1145/1542275.1542357
M3 - Conference contribution
AN - SCOPUS:70449717519
SN - 9781605584980
T3 - Proceedings of the International Conference on Supercomputing
SP - 515
EP - 516
BT - ICS'09 - Proceedings of the 23rd International Conference on Supercomputing
T2 - 23rd International Conference on Supercomputing, ICS'09
Y2 - 8 June 2009 through 12 June 2009
ER -