Integrated CUDA-to-FPGA synthesis with network-on-chip

Swathi T. Gurumani, Jacob Tolar, Yao Chen, Yun Liang, Kyle Rupnow, Deming Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Data parallel languages such as CUDA and OpenCL efficiently describe many parallel threads of computation, and HLS tools can effectively translate these descriptions into independent optimized cores. As the number of instantiated cores grows, average external memory access latency can be a significant factor in system performance. However, although each core produces outputs independently, the cores often heavily share input data. Exploiting on-chip data sharing both reduces external bandwidth demand and improves the average memory access latency, allowing the system to improve performance at the same number of cores. In this paper, we develop a network-on-chip coupled with computation cores synthesized from CUDA for FPGAs that enables on-chip data sharing. We demonstrate reduced external bandwidth demand by up to 60% (average 56%) and total application latency in cycles by up to 43% (average 27%).

Original languageEnglish (US)
Title of host publicationProceedings - 2014 IEEE 22nd International Symposium on Field-Programmable Custom Computing Machines, FCCM 2014
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages21-24
Number of pages4
ISBN (Electronic)9781479951116
DOIs
StatePublished - Jul 21 2014
Event22nd IEEE International Symposium on Field-Programmable Custom Computing Machines, FCCM 2014 - Boston, United States
Duration: May 11 2014May 13 2014

Publication series

NameProceedings - 2014 IEEE 22nd International Symposium on Field-Programmable Custom Computing Machines, FCCM 2014

Other

Other22nd IEEE International Symposium on Field-Programmable Custom Computing Machines, FCCM 2014
Country/TerritoryUnited States
CityBoston
Period5/11/145/13/14

ASJC Scopus subject areas

  • Hardware and Architecture
  • Software
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Integrated CUDA-to-FPGA synthesis with network-on-chip'. Together they form a unique fingerprint.

Cite this