Multilevel granularity parallelism synthesis on FPGAs

Alexandros Papakonstantinou, Yun Liang, John A. Stratton, Karthik Gururaj, Deming Chen, Wen-Mei W Hwu, Jason Cong

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Recent progress in High-Level Synthesis (HLS) techniques has helped raise the abstraction level of FPGA programming. However implementation and performance evaluation of the HLS-generated RTL, involves lengthy logic synthesis and physical design flows. Moreover, mapping of different levels of coarse grained parallelism onto hardware spatial parallelism affects the final FPGA-based performance both in terms of cycles and frequency. Evaluation of the rich design space through the full implementation flow - starting with high level source code and ending with routed net list - is prohibitive in various scientific and computing domains, thus hindering the adoption of reconfigurable computing. This work presents a framework for multilevel granularity parallelism exploration with HLS-order of efficiency. Our framework considers different granularities of parallelism for mapping CUDA kernels onto high performance FPGA-based accelerators. We leverage resource and clock period models to estimate the impact of multi-granularity parallelism extraction on execution cycles and frequency. The proposed Multilevel Granularity Parallelism Synthesis (ML-GPS) framework employs an efficient design space search heuristic in tandem with the estimation models as well as design layout information to derive a performance near-optimal configuration. Our experimental results demonstrate that ML-GPS can efficiently identify and generate CUDA kernel configurations that can significantly outperform previous related tools whereas it can offer competitive performance compared to software kernel execution on GPUs at a fraction of the energy cost.

Original languageEnglish (US)
Title of host publicationProceedings - IEEE International Symposium on Field-Programmable Custom Computing Machines, FCCM 2011
Pages178-185
Number of pages8
DOIs
StatePublished - Jun 17 2011
Event19th IEEE International Symposium on Field-Programmable Custom Computing Machines, FCCM 2011 - Salt Lake City, UT, United States
Duration: May 1 2011May 3 2011

Publication series

NameProceedings - IEEE International Symposium on Field-Programmable Custom Computing Machines, FCCM 2011

Other

Other19th IEEE International Symposium on Field-Programmable Custom Computing Machines, FCCM 2011
CountryUnited States
CitySalt Lake City, UT
Period5/1/115/3/11

    Fingerprint

Keywords

  • Design Space Exploration
  • FPGA
  • High-Level Sytnthesis
  • Parallel Computing

ASJC Scopus subject areas

  • Hardware and Architecture

Cite this

Papakonstantinou, A., Liang, Y., Stratton, J. A., Gururaj, K., Chen, D., Hwu, W-M. W., & Cong, J. (2011). Multilevel granularity parallelism synthesis on FPGAs. In Proceedings - IEEE International Symposium on Field-Programmable Custom Computing Machines, FCCM 2011 (pp. 178-185). [5771270] (Proceedings - IEEE International Symposium on Field-Programmable Custom Computing Machines, FCCM 2011). https://doi.org/10.1109/FCCM.2011.29