TY - GEN
T1 - Face recognition with hybrid efficient convolution algorithms on FPGAs
AU - Zhuge, Chuanhao
AU - Liu, Xinheng
AU - Zhang, Xiaofan
AU - Gummadi, Sudeep
AU - Xiong, Jinjun
AU - Chen, Deming
N1 - Publisher Copyright:
© 2018 Association for Computing Machinery.
PY - 2018/5/30
Y1 - 2018/5/30
N2 - Deep Convolutional Neural Networks (CNN) have become a Swiss knife in solving critical artificial intelligence tasks. However, deploying deep CNN models for latency-critical tasks remains to be challenging because of the complex nature of CNNs. Recently, FPGA has become a favorable device to accelerate deep CNNs thanks to its high parallel processing capability and energy efficiency. In this work, we explore different fast convolution algorithms including Winograd and Fast Fourier Transform (FFT), and find an optimal strategy to apply them together on different types of convolutions. We also propose an optimization scheme to exploit parallelism on novel CNN architectures such as Inception modules in GoogLeNet. We implement a configurable IP-based face recognition acceleration system based on FaceNet using High-Level Synthesis. Our implementation on a Xilinx Ultrascale device achieves 3.75x latency speedup compared to a high-end NVIDIA GPU and surpasses previous FPGA results significantly.
AB - Deep Convolutional Neural Networks (CNN) have become a Swiss knife in solving critical artificial intelligence tasks. However, deploying deep CNN models for latency-critical tasks remains to be challenging because of the complex nature of CNNs. Recently, FPGA has become a favorable device to accelerate deep CNNs thanks to its high parallel processing capability and energy efficiency. In this work, we explore different fast convolution algorithms including Winograd and Fast Fourier Transform (FFT), and find an optimal strategy to apply them together on different types of convolutions. We also propose an optimization scheme to exploit parallelism on novel CNN architectures such as Inception modules in GoogLeNet. We implement a configurable IP-based face recognition acceleration system based on FaceNet using High-Level Synthesis. Our implementation on a Xilinx Ultrascale device achieves 3.75x latency speedup compared to a high-end NVIDIA GPU and surpasses previous FPGA results significantly.
UR - http://www.scopus.com/inward/record.url?scp=85049459489&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85049459489&partnerID=8YFLogxK
U2 - 10.1145/3194554.3194597
DO - 10.1145/3194554.3194597
M3 - Conference contribution
AN - SCOPUS:85049459489
T3 - Proceedings of the ACM Great Lakes Symposium on VLSI, GLSVLSI
SP - 123
EP - 128
BT - GLSVLSI 2018 - Proceedings of the 2018 Great Lakes Symposium on VLSI
PB - Association for Computing Machinery
T2 - 28th Great Lakes Symposium on VLSI, GLSVLSI 2018
Y2 - 23 May 2018 through 25 May 2018
ER -