TY - GEN
T1 - Multilevel granularity parallelism synthesis on FPGAs
AU - Papakonstantinou, Alexandros
AU - Liang, Yun
AU - Stratton, John A.
AU - Gururaj, Karthik
AU - Chen, Deming
AU - Hwu, Wen Mei W.
AU - Cong, Jason
PY - 2011
Y1 - 2011
N2 - Recent progress in High-Level Synthesis (HLS) techniques has helped raise the abstraction level of FPGA programming. However implementation and performance evaluation of the HLS-generated RTL, involves lengthy logic synthesis and physical design flows. Moreover, mapping of different levels of coarse grained parallelism onto hardware spatial parallelism affects the final FPGA-based performance both in terms of cycles and frequency. Evaluation of the rich design space through the full implementation flow - starting with high level source code and ending with routed net list - is prohibitive in various scientific and computing domains, thus hindering the adoption of reconfigurable computing. This work presents a framework for multilevel granularity parallelism exploration with HLS-order of efficiency. Our framework considers different granularities of parallelism for mapping CUDA kernels onto high performance FPGA-based accelerators. We leverage resource and clock period models to estimate the impact of multi-granularity parallelism extraction on execution cycles and frequency. The proposed Multilevel Granularity Parallelism Synthesis (ML-GPS) framework employs an efficient design space search heuristic in tandem with the estimation models as well as design layout information to derive a performance near-optimal configuration. Our experimental results demonstrate that ML-GPS can efficiently identify and generate CUDA kernel configurations that can significantly outperform previous related tools whereas it can offer competitive performance compared to software kernel execution on GPUs at a fraction of the energy cost.
AB - Recent progress in High-Level Synthesis (HLS) techniques has helped raise the abstraction level of FPGA programming. However implementation and performance evaluation of the HLS-generated RTL, involves lengthy logic synthesis and physical design flows. Moreover, mapping of different levels of coarse grained parallelism onto hardware spatial parallelism affects the final FPGA-based performance both in terms of cycles and frequency. Evaluation of the rich design space through the full implementation flow - starting with high level source code and ending with routed net list - is prohibitive in various scientific and computing domains, thus hindering the adoption of reconfigurable computing. This work presents a framework for multilevel granularity parallelism exploration with HLS-order of efficiency. Our framework considers different granularities of parallelism for mapping CUDA kernels onto high performance FPGA-based accelerators. We leverage resource and clock period models to estimate the impact of multi-granularity parallelism extraction on execution cycles and frequency. The proposed Multilevel Granularity Parallelism Synthesis (ML-GPS) framework employs an efficient design space search heuristic in tandem with the estimation models as well as design layout information to derive a performance near-optimal configuration. Our experimental results demonstrate that ML-GPS can efficiently identify and generate CUDA kernel configurations that can significantly outperform previous related tools whereas it can offer competitive performance compared to software kernel execution on GPUs at a fraction of the energy cost.
KW - Design Space Exploration
KW - FPGA
KW - High-Level Sytnthesis
KW - Parallel Computing
UR - http://www.scopus.com/inward/record.url?scp=79958742174&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79958742174&partnerID=8YFLogxK
U2 - 10.1109/FCCM.2011.29
DO - 10.1109/FCCM.2011.29
M3 - Conference contribution
AN - SCOPUS:79958742174
SN - 9780769543017
T3 - Proceedings - IEEE International Symposium on Field-Programmable Custom Computing Machines, FCCM 2011
SP - 178
EP - 185
BT - Proceedings - IEEE International Symposium on Field-Programmable Custom Computing Machines, FCCM 2011
T2 - 19th IEEE International Symposium on Field-Programmable Custom Computing Machines, FCCM 2011
Y2 - 1 May 2011 through 3 May 2011
ER -