TY - JOUR
T1 - Exploring many-core design templates for FPGAs and ASICs
AU - Lebedev, Ilia
AU - Fletcher, Christopher
AU - Cheng, Shaoyi
AU - Martin, James
AU - Doupnik, Austin
AU - Burke, Daniel
AU - Lin, Mingjie
AU - Wawrzynek, John
PY - 2012
Y1 - 2012
N2 - We present a highly productive approach to hardware design based on a many-core microarchitectural template used to implement compute-bound applications expressed in a high-level data-parallel language such as OpenCL. The template is customized on a per-application basis via a range of high-level parameters such as the interconnect topology or processing element architecture. The key benefits of this approach are that it (i) allows programmers to express parallelism through an API defined in a high-level programming language, (ii) supports coarse-grained multithreading and fine-grained threading while permitting bit-level resource control, and (iii) reduces the effort required to repurpose the system for different algorithms or different applications. We compare template-driven design to both full-custom and programmable approaches by studying implementations of a compute-bound data-parallel Bayesian graph inference algorithm across several candidate platforms. Specifically, we examine a range of template-based implementations on both FPGA and ASIC platforms and compare each against full custom designs. Throughout this study, we use a general-purpose graphics processing unit (GPGPU) implementation as a performance and area baseline. We show that our approach, similar in productivity to programmable approaches such as GPGPU applications, yields implementations with performance approaching that of full-custom designs on both FPGA and ASIC platforms.
AB - We present a highly productive approach to hardware design based on a many-core microarchitectural template used to implement compute-bound applications expressed in a high-level data-parallel language such as OpenCL. The template is customized on a per-application basis via a range of high-level parameters such as the interconnect topology or processing element architecture. The key benefits of this approach are that it (i) allows programmers to express parallelism through an API defined in a high-level programming language, (ii) supports coarse-grained multithreading and fine-grained threading while permitting bit-level resource control, and (iii) reduces the effort required to repurpose the system for different algorithms or different applications. We compare template-driven design to both full-custom and programmable approaches by studying implementations of a compute-bound data-parallel Bayesian graph inference algorithm across several candidate platforms. Specifically, we examine a range of template-based implementations on both FPGA and ASIC platforms and compare each against full custom designs. Throughout this study, we use a general-purpose graphics processing unit (GPGPU) implementation as a performance and area baseline. We show that our approach, similar in productivity to programmable approaches such as GPGPU applications, yields implementations with performance approaching that of full-custom designs on both FPGA and ASIC platforms.
UR - http://www.scopus.com/inward/record.url?scp=84855263806&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84855263806&partnerID=8YFLogxK
U2 - 10.1155/2012/439141
DO - 10.1155/2012/439141
M3 - Article
AN - SCOPUS:84855263806
SN - 1687-7195
VL - 2012
JO - International Journal of Reconfigurable Computing
JF - International Journal of Reconfigurable Computing
M1 - 439141
ER -