Multilevel granularity parallelism synthesis on FPGAs

Alexandros Papakonstantinou, Yun Liang, John A. Stratton, Karthik Gururaj, Deming Chen, Wen-Mei W Hwu, Jason Cong

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Recent progress in High-Level Synthesis (HLS) techniques has helped raise the abstraction level of FPGA programming. However implementation and performance evaluation of the HLS-generated RTL, involves lengthy logic synthesis and physical design flows. Moreover, mapping of different levels of coarse grained parallelism onto hardware spatial parallelism affects the final FPGA-based performance both in terms of cycles and frequency. Evaluation of the rich design space through the full implementation flow - starting with high level source code and ending with routed net list - is prohibitive in various scientific and computing domains, thus hindering the adoption of reconfigurable computing. This work presents a framework for multilevel granularity parallelism exploration with HLS-order of efficiency. Our framework considers different granularities of parallelism for mapping CUDA kernels onto high performance FPGA-based accelerators. We leverage resource and clock period models to estimate the impact of multi-granularity parallelism extraction on execution cycles and frequency. The proposed Multilevel Granularity Parallelism Synthesis (ML-GPS) framework employs an efficient design space search heuristic in tandem with the estimation models as well as design layout information to derive a performance near-optimal configuration. Our experimental results demonstrate that ML-GPS can efficiently identify and generate CUDA kernel configurations that can significantly outperform previous related tools whereas it can offer competitive performance compared to software kernel execution on GPUs at a fraction of the energy cost.

Original languageEnglish (US)
Title of host publicationProceedings - IEEE International Symposium on Field-Programmable Custom Computing Machines, FCCM 2011
Pages178-185
Number of pages8
DOIs
StatePublished - Jun 17 2011
Event19th IEEE International Symposium on Field-Programmable Custom Computing Machines, FCCM 2011 - Salt Lake City, UT, United States
Duration: May 1 2011May 3 2011

Publication series

NameProceedings - IEEE International Symposium on Field-Programmable Custom Computing Machines, FCCM 2011

Other

Other19th IEEE International Symposium on Field-Programmable Custom Computing Machines, FCCM 2011
CountryUnited States
CitySalt Lake City, UT
Period5/1/115/3/11

Fingerprint

Field programmable gate arrays (FPGA)
Particle accelerators
Clocks
Hardware
High level synthesis
Costs

Keywords

  • Design Space Exploration
  • FPGA
  • High-Level Sytnthesis
  • Parallel Computing

ASJC Scopus subject areas

  • Hardware and Architecture

Cite this

Papakonstantinou, A., Liang, Y., Stratton, J. A., Gururaj, K., Chen, D., Hwu, W-M. W., & Cong, J. (2011). Multilevel granularity parallelism synthesis on FPGAs. In Proceedings - IEEE International Symposium on Field-Programmable Custom Computing Machines, FCCM 2011 (pp. 178-185). [5771270] (Proceedings - IEEE International Symposium on Field-Programmable Custom Computing Machines, FCCM 2011). https://doi.org/10.1109/FCCM.2011.29

Multilevel granularity parallelism synthesis on FPGAs. / Papakonstantinou, Alexandros; Liang, Yun; Stratton, John A.; Gururaj, Karthik; Chen, Deming; Hwu, Wen-Mei W; Cong, Jason.

Proceedings - IEEE International Symposium on Field-Programmable Custom Computing Machines, FCCM 2011. 2011. p. 178-185 5771270 (Proceedings - IEEE International Symposium on Field-Programmable Custom Computing Machines, FCCM 2011).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Papakonstantinou, A, Liang, Y, Stratton, JA, Gururaj, K, Chen, D, Hwu, W-MW & Cong, J 2011, Multilevel granularity parallelism synthesis on FPGAs. in Proceedings - IEEE International Symposium on Field-Programmable Custom Computing Machines, FCCM 2011., 5771270, Proceedings - IEEE International Symposium on Field-Programmable Custom Computing Machines, FCCM 2011, pp. 178-185, 19th IEEE International Symposium on Field-Programmable Custom Computing Machines, FCCM 2011, Salt Lake City, UT, United States, 5/1/11. https://doi.org/10.1109/FCCM.2011.29
Papakonstantinou A, Liang Y, Stratton JA, Gururaj K, Chen D, Hwu W-MW et al. Multilevel granularity parallelism synthesis on FPGAs. In Proceedings - IEEE International Symposium on Field-Programmable Custom Computing Machines, FCCM 2011. 2011. p. 178-185. 5771270. (Proceedings - IEEE International Symposium on Field-Programmable Custom Computing Machines, FCCM 2011). https://doi.org/10.1109/FCCM.2011.29
Papakonstantinou, Alexandros ; Liang, Yun ; Stratton, John A. ; Gururaj, Karthik ; Chen, Deming ; Hwu, Wen-Mei W ; Cong, Jason. / Multilevel granularity parallelism synthesis on FPGAs. Proceedings - IEEE International Symposium on Field-Programmable Custom Computing Machines, FCCM 2011. 2011. pp. 178-185 (Proceedings - IEEE International Symposium on Field-Programmable Custom Computing Machines, FCCM 2011).
@inproceedings{e2fe86a56ad941ab8448ef9ee7bc9071,
title = "Multilevel granularity parallelism synthesis on FPGAs",
abstract = "Recent progress in High-Level Synthesis (HLS) techniques has helped raise the abstraction level of FPGA programming. However implementation and performance evaluation of the HLS-generated RTL, involves lengthy logic synthesis and physical design flows. Moreover, mapping of different levels of coarse grained parallelism onto hardware spatial parallelism affects the final FPGA-based performance both in terms of cycles and frequency. Evaluation of the rich design space through the full implementation flow - starting with high level source code and ending with routed net list - is prohibitive in various scientific and computing domains, thus hindering the adoption of reconfigurable computing. This work presents a framework for multilevel granularity parallelism exploration with HLS-order of efficiency. Our framework considers different granularities of parallelism for mapping CUDA kernels onto high performance FPGA-based accelerators. We leverage resource and clock period models to estimate the impact of multi-granularity parallelism extraction on execution cycles and frequency. The proposed Multilevel Granularity Parallelism Synthesis (ML-GPS) framework employs an efficient design space search heuristic in tandem with the estimation models as well as design layout information to derive a performance near-optimal configuration. Our experimental results demonstrate that ML-GPS can efficiently identify and generate CUDA kernel configurations that can significantly outperform previous related tools whereas it can offer competitive performance compared to software kernel execution on GPUs at a fraction of the energy cost.",
keywords = "Design Space Exploration, FPGA, High-Level Sytnthesis, Parallel Computing",
author = "Alexandros Papakonstantinou and Yun Liang and Stratton, {John A.} and Karthik Gururaj and Deming Chen and Hwu, {Wen-Mei W} and Jason Cong",
year = "2011",
month = "6",
day = "17",
doi = "10.1109/FCCM.2011.29",
language = "English (US)",
isbn = "9780769543017",
series = "Proceedings - IEEE International Symposium on Field-Programmable Custom Computing Machines, FCCM 2011",
pages = "178--185",
booktitle = "Proceedings - IEEE International Symposium on Field-Programmable Custom Computing Machines, FCCM 2011",

}

TY - GEN

T1 - Multilevel granularity parallelism synthesis on FPGAs

AU - Papakonstantinou, Alexandros

AU - Liang, Yun

AU - Stratton, John A.

AU - Gururaj, Karthik

AU - Chen, Deming

AU - Hwu, Wen-Mei W

AU - Cong, Jason

PY - 2011/6/17

Y1 - 2011/6/17

N2 - Recent progress in High-Level Synthesis (HLS) techniques has helped raise the abstraction level of FPGA programming. However implementation and performance evaluation of the HLS-generated RTL, involves lengthy logic synthesis and physical design flows. Moreover, mapping of different levels of coarse grained parallelism onto hardware spatial parallelism affects the final FPGA-based performance both in terms of cycles and frequency. Evaluation of the rich design space through the full implementation flow - starting with high level source code and ending with routed net list - is prohibitive in various scientific and computing domains, thus hindering the adoption of reconfigurable computing. This work presents a framework for multilevel granularity parallelism exploration with HLS-order of efficiency. Our framework considers different granularities of parallelism for mapping CUDA kernels onto high performance FPGA-based accelerators. We leverage resource and clock period models to estimate the impact of multi-granularity parallelism extraction on execution cycles and frequency. The proposed Multilevel Granularity Parallelism Synthesis (ML-GPS) framework employs an efficient design space search heuristic in tandem with the estimation models as well as design layout information to derive a performance near-optimal configuration. Our experimental results demonstrate that ML-GPS can efficiently identify and generate CUDA kernel configurations that can significantly outperform previous related tools whereas it can offer competitive performance compared to software kernel execution on GPUs at a fraction of the energy cost.

AB - Recent progress in High-Level Synthesis (HLS) techniques has helped raise the abstraction level of FPGA programming. However implementation and performance evaluation of the HLS-generated RTL, involves lengthy logic synthesis and physical design flows. Moreover, mapping of different levels of coarse grained parallelism onto hardware spatial parallelism affects the final FPGA-based performance both in terms of cycles and frequency. Evaluation of the rich design space through the full implementation flow - starting with high level source code and ending with routed net list - is prohibitive in various scientific and computing domains, thus hindering the adoption of reconfigurable computing. This work presents a framework for multilevel granularity parallelism exploration with HLS-order of efficiency. Our framework considers different granularities of parallelism for mapping CUDA kernels onto high performance FPGA-based accelerators. We leverage resource and clock period models to estimate the impact of multi-granularity parallelism extraction on execution cycles and frequency. The proposed Multilevel Granularity Parallelism Synthesis (ML-GPS) framework employs an efficient design space search heuristic in tandem with the estimation models as well as design layout information to derive a performance near-optimal configuration. Our experimental results demonstrate that ML-GPS can efficiently identify and generate CUDA kernel configurations that can significantly outperform previous related tools whereas it can offer competitive performance compared to software kernel execution on GPUs at a fraction of the energy cost.

KW - Design Space Exploration

KW - FPGA

KW - High-Level Sytnthesis

KW - Parallel Computing

UR - http://www.scopus.com/inward/record.url?scp=79958742174&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79958742174&partnerID=8YFLogxK

U2 - 10.1109/FCCM.2011.29

DO - 10.1109/FCCM.2011.29

M3 - Conference contribution

AN - SCOPUS:79958742174

SN - 9780769543017

T3 - Proceedings - IEEE International Symposium on Field-Programmable Custom Computing Machines, FCCM 2011

SP - 178

EP - 185

BT - Proceedings - IEEE International Symposium on Field-Programmable Custom Computing Machines, FCCM 2011

ER -