TY - GEN
T1 - HLS-Based Acceleration Framework for Deep Convolutional Neural Networks
AU - Misra, Ashish
AU - Kindratenko, Volodymyr
N1 - Publisher Copyright:
© 2020, Springer Nature Switzerland AG.
PY - 2020
Y1 - 2020
N2 - Deep Neural Networks (DNNs) have been successfully applied in many fields. Considering performance, flexibility, and energy efficiency, Field Programmable Gate Array (FPGA) based accelerator for DNNs is a promising solution. The existing frameworks however lack the possibility of reusability and friendliness to design a new network with minimum efforts. Modern high-level synthesis (HLS) tools greatly reduce the turnaround time of designing and implementing complex FPGA-based accelerators. This paper presents a framework for hardware accelerator for DNNs using high level specification. A novel architecture is introduced that maximizes data reuse and external memory bandwidth. This framework allows to generate a scalable HLS code for a given pre-trained model that can be mapped to different FPGA platforms. Various HLS compiler optimizations have been applied to the code to produce efficient implementation and high resource utilization. The framework achieves a peak performance of 23 frames per second for SqueezeNet on Xilinx Alveo u250 board.
AB - Deep Neural Networks (DNNs) have been successfully applied in many fields. Considering performance, flexibility, and energy efficiency, Field Programmable Gate Array (FPGA) based accelerator for DNNs is a promising solution. The existing frameworks however lack the possibility of reusability and friendliness to design a new network with minimum efforts. Modern high-level synthesis (HLS) tools greatly reduce the turnaround time of designing and implementing complex FPGA-based accelerators. This paper presents a framework for hardware accelerator for DNNs using high level specification. A novel architecture is introduced that maximizes data reuse and external memory bandwidth. This framework allows to generate a scalable HLS code for a given pre-trained model that can be mapped to different FPGA platforms. Various HLS compiler optimizations have been applied to the code to produce efficient implementation and high resource utilization. The framework achieves a peak performance of 23 frames per second for SqueezeNet on Xilinx Alveo u250 board.
KW - Accelerator design
KW - FPGA
KW - High level synthesis
UR - http://www.scopus.com/inward/record.url?scp=85083037791&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85083037791&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-44534-8_17
DO - 10.1007/978-3-030-44534-8_17
M3 - Conference contribution
AN - SCOPUS:85083037791
SN - 9783030445331
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 221
EP - 231
BT - Applied Reconfigurable Computing. Architectures, Tools, and Applications - 16th International Symposium, ARC 2020, Proceedings
A2 - Rincón, Fernando
A2 - Barba, Jesús
A2 - Caba, Julián
A2 - So, Hayden K.H.
A2 - Diniz, Pedro
PB - Springer
T2 - 16th International Symposium on Applied Reconfigurable Computing, ARC 2020
Y2 - 1 April 2020 through 3 April 2020
ER -