Throughput-oriented kernel porting onto FPGAs

Alexandros Papakonstantinou, Deming Chen, Wen-Mei W Hwu, Jason Cong, Liang Yun

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Reconfigurable devices are often employed in heterogeneous systems due to their low power and parallel processing advantages. An important usability requirement is the support of a homogeneous programming interface. Nevertheless, homogeneous programming interfaces do not eliminate the need for code tweaking to enable efficient mapping of the computation across heterogeneous architectures. In this work we propose a code optimization framework which analyzes and restructures CUDA kernels that are optimized for GPU devices in order to facilitate synthesis of high-throughput custom accelerators on FPGAs. The proposed framework enables efficient performance porting without manual code tweaking or annotation by the user. A hierarchical region graph in tandem with code motions and graph coloring of array variables is employed to restructure the kernel for high throughput execution on FP- GAs.

Original languageEnglish (US)
Title of host publicationProceedings of the 50th Annual Design Automation Conference, DAC 2013
DOIs
StatePublished - Jul 12 2013
Event50th Annual Design Automation Conference, DAC 2013 - Austin, TX, United States
Duration: May 29 2013Jun 7 2013

Publication series

NameProceedings - Design Automation Conference
ISSN (Print)0738-100X

Other

Other50th Annual Design Automation Conference, DAC 2013
CountryUnited States
CityAustin, TX
Period5/29/136/7/13

Fingerprint

Field Programmable Gate Array
Field programmable gate arrays (FPGA)
Throughput
kernel
Coloring
High Throughput
Particle accelerators
Programming
Graph Coloring
Heterogeneous Systems
Processing
Accelerator
Parallel Processing
Usability
Annotation
Eliminate
Synthesis
Motion
Optimization
Requirements

ASJC Scopus subject areas

  • Computer Science Applications
  • Control and Systems Engineering
  • Electrical and Electronic Engineering
  • Modeling and Simulation

Cite this

Papakonstantinou, A., Chen, D., Hwu, W-M. W., Cong, J., & Yun, L. (2013). Throughput-oriented kernel porting onto FPGAs. In Proceedings of the 50th Annual Design Automation Conference, DAC 2013 [11] (Proceedings - Design Automation Conference). https://doi.org/10.1145/2463209.2488747

Throughput-oriented kernel porting onto FPGAs. / Papakonstantinou, Alexandros; Chen, Deming; Hwu, Wen-Mei W; Cong, Jason; Yun, Liang.

Proceedings of the 50th Annual Design Automation Conference, DAC 2013. 2013. 11 (Proceedings - Design Automation Conference).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Papakonstantinou, A, Chen, D, Hwu, W-MW, Cong, J & Yun, L 2013, Throughput-oriented kernel porting onto FPGAs. in Proceedings of the 50th Annual Design Automation Conference, DAC 2013., 11, Proceedings - Design Automation Conference, 50th Annual Design Automation Conference, DAC 2013, Austin, TX, United States, 5/29/13. https://doi.org/10.1145/2463209.2488747
Papakonstantinou A, Chen D, Hwu W-MW, Cong J, Yun L. Throughput-oriented kernel porting onto FPGAs. In Proceedings of the 50th Annual Design Automation Conference, DAC 2013. 2013. 11. (Proceedings - Design Automation Conference). https://doi.org/10.1145/2463209.2488747
Papakonstantinou, Alexandros ; Chen, Deming ; Hwu, Wen-Mei W ; Cong, Jason ; Yun, Liang. / Throughput-oriented kernel porting onto FPGAs. Proceedings of the 50th Annual Design Automation Conference, DAC 2013. 2013. (Proceedings - Design Automation Conference).
@inproceedings{d1f0ca970115423a90300bab6d300455,
title = "Throughput-oriented kernel porting onto FPGAs",
abstract = "Reconfigurable devices are often employed in heterogeneous systems due to their low power and parallel processing advantages. An important usability requirement is the support of a homogeneous programming interface. Nevertheless, homogeneous programming interfaces do not eliminate the need for code tweaking to enable efficient mapping of the computation across heterogeneous architectures. In this work we propose a code optimization framework which analyzes and restructures CUDA kernels that are optimized for GPU devices in order to facilitate synthesis of high-throughput custom accelerators on FPGAs. The proposed framework enables efficient performance porting without manual code tweaking or annotation by the user. A hierarchical region graph in tandem with code motions and graph coloring of array variables is employed to restructure the kernel for high throughput execution on FP- GAs.",
author = "Alexandros Papakonstantinou and Deming Chen and Hwu, {Wen-Mei W} and Jason Cong and Liang Yun",
year = "2013",
month = "7",
day = "12",
doi = "10.1145/2463209.2488747",
language = "English (US)",
isbn = "9781450320719",
series = "Proceedings - Design Automation Conference",
booktitle = "Proceedings of the 50th Annual Design Automation Conference, DAC 2013",

}

TY - GEN

T1 - Throughput-oriented kernel porting onto FPGAs

AU - Papakonstantinou, Alexandros

AU - Chen, Deming

AU - Hwu, Wen-Mei W

AU - Cong, Jason

AU - Yun, Liang

PY - 2013/7/12

Y1 - 2013/7/12

N2 - Reconfigurable devices are often employed in heterogeneous systems due to their low power and parallel processing advantages. An important usability requirement is the support of a homogeneous programming interface. Nevertheless, homogeneous programming interfaces do not eliminate the need for code tweaking to enable efficient mapping of the computation across heterogeneous architectures. In this work we propose a code optimization framework which analyzes and restructures CUDA kernels that are optimized for GPU devices in order to facilitate synthesis of high-throughput custom accelerators on FPGAs. The proposed framework enables efficient performance porting without manual code tweaking or annotation by the user. A hierarchical region graph in tandem with code motions and graph coloring of array variables is employed to restructure the kernel for high throughput execution on FP- GAs.

AB - Reconfigurable devices are often employed in heterogeneous systems due to their low power and parallel processing advantages. An important usability requirement is the support of a homogeneous programming interface. Nevertheless, homogeneous programming interfaces do not eliminate the need for code tweaking to enable efficient mapping of the computation across heterogeneous architectures. In this work we propose a code optimization framework which analyzes and restructures CUDA kernels that are optimized for GPU devices in order to facilitate synthesis of high-throughput custom accelerators on FPGAs. The proposed framework enables efficient performance porting without manual code tweaking or annotation by the user. A hierarchical region graph in tandem with code motions and graph coloring of array variables is employed to restructure the kernel for high throughput execution on FP- GAs.

UR - http://www.scopus.com/inward/record.url?scp=84879867459&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84879867459&partnerID=8YFLogxK

U2 - 10.1145/2463209.2488747

DO - 10.1145/2463209.2488747

M3 - Conference contribution

AN - SCOPUS:84879867459

SN - 9781450320719

T3 - Proceedings - Design Automation Conference

BT - Proceedings of the 50th Annual Design Automation Conference, DAC 2013

ER -