TY - GEN
T1 - Efficient HW and SW Interface Design for Convolutional Neural Networks Using High-Level Synthesis and TensorFlow
AU - Misra, Ashish
AU - He, Churan
AU - Kindratenko, Volodymyr
N1 - Funding Information:
ACKNOWLEDGMENT This work is funded by the National Science Foundation's Major Research Instrumentation program, grant #1725729.
Publisher Copyright:
© 2021 IEEE.
PY - 2021
Y1 - 2021
N2 - Hardware accelerators have been extensively used for the deployment of convolutional neural networks (CNNs) as they offer speedup by exploiting the parallelism that exists in CNNs. The development of such accelerators spans a large design space and involves a complex execution model that includes software and hardware modules. The figures of merit of an accelerator are its frequency of operation, the number of operations performed per unit time, and various supported configurations and thus designing such accelerators becomes a multi-objective optimization problem. This work presents a systematic approach to developing an efficient framework for CNNs that qualifies such merits and can be scaled to different configurations using Xilinx Vitis-HLS. High-level synthesis (HLS) has proved to be a promising solution to describe large and complex designs in a short time. The presented framework utilizes four copies of a single unified module for executing convolution and pooling in hardware and uses TensorFlow to run certain layers in software using multiprocessing. The framework has been evaluated with Squeezenet 1.0, VGG 16, and Resnet 50 at 250 MHz clock frequency on the Xilinx Alveo U250 board achieving 750 GOPS.
AB - Hardware accelerators have been extensively used for the deployment of convolutional neural networks (CNNs) as they offer speedup by exploiting the parallelism that exists in CNNs. The development of such accelerators spans a large design space and involves a complex execution model that includes software and hardware modules. The figures of merit of an accelerator are its frequency of operation, the number of operations performed per unit time, and various supported configurations and thus designing such accelerators becomes a multi-objective optimization problem. This work presents a systematic approach to developing an efficient framework for CNNs that qualifies such merits and can be scaled to different configurations using Xilinx Vitis-HLS. High-level synthesis (HLS) has proved to be a promising solution to describe large and complex designs in a short time. The presented framework utilizes four copies of a single unified module for executing convolution and pooling in hardware and uses TensorFlow to run certain layers in software using multiprocessing. The framework has been evaluated with Squeezenet 1.0, VGG 16, and Resnet 50 at 250 MHz clock frequency on the Xilinx Alveo U250 board achieving 750 GOPS.
KW - Accelerator design
KW - High-level synthesis
KW - TensorFlow
UR - http://www.scopus.com/inward/record.url?scp=85124220403&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85124220403&partnerID=8YFLogxK
U2 - 10.1109/H2RC54759.2021.00006
DO - 10.1109/H2RC54759.2021.00006
M3 - Conference contribution
AN - SCOPUS:85124220403
T3 - Proceedings of H2RC 2021: 7th International Workshop on Heterogeneous High-Performance Reconfigurable Computing, Held in conjunction with SC 2021: The International Conference for High Performance Computing, Networking, Storage and Analysis
SP - 1
EP - 8
BT - Proceedings of H2RC 2021
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 7th IEEE/ACM International Workshop on Heterogeneous High-Performance Reconfigurable Computing, H2RC 2021
Y2 - 15 November 2021
ER -