TY - GEN
T1 - Machine learning on FPGAs to face the IoT revolution
AU - Zhang, Xiaofan
AU - Ramachandran, Anand
AU - Zhuge, Chuanhao
AU - He, Di
AU - Zuo, Wei
AU - Cheng, Zuofu
AU - Rupnow, Kyle
AU - Chen, Deming
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/12/13
Y1 - 2017/12/13
N2 - FPGAs have been rapidly adopted for acceleration of Deep Neural Networks (DNNs) with improved latency and energy efficiency compared to CPU and GPU-based implementations. High-level synthesis (HLS) is an effective design flow for DNNs due to improved productivity, debugging, and design space exploration ability. However, optimizing large neural networks under resource constraints for FPGAs is still a key challenge. In this paper, we present a series of effective design techniques for implementing DNNs on FPGAs with high performance and energy efficiency. These include the use of configurable DNN IPs, performance and resource modeling, resource allocation across DNN layers, and DNN reduction and re-training. We showcase several design solutions including Long-term Recurrent Convolution Network (LRCN) for video captioning, Inception module for FaceNet face recognition, as well as Long Short-Term Memory (LSTM) for sound recognition. These and other similar DNN solutions are ideal implementations to be deployed in vision or sound based IoT applications.
AB - FPGAs have been rapidly adopted for acceleration of Deep Neural Networks (DNNs) with improved latency and energy efficiency compared to CPU and GPU-based implementations. High-level synthesis (HLS) is an effective design flow for DNNs due to improved productivity, debugging, and design space exploration ability. However, optimizing large neural networks under resource constraints for FPGAs is still a key challenge. In this paper, we present a series of effective design techniques for implementing DNNs on FPGAs with high performance and energy efficiency. These include the use of configurable DNN IPs, performance and resource modeling, resource allocation across DNN layers, and DNN reduction and re-training. We showcase several design solutions including Long-term Recurrent Convolution Network (LRCN) for video captioning, Inception module for FaceNet face recognition, as well as Long Short-Term Memory (LSTM) for sound recognition. These and other similar DNN solutions are ideal implementations to be deployed in vision or sound based IoT applications.
KW - FPGAs
KW - Internet of Things
KW - Machine Learning
UR - http://www.scopus.com/inward/record.url?scp=85043515848&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85043515848&partnerID=8YFLogxK
U2 - 10.1109/ICCAD.2017.8203875
DO - 10.1109/ICCAD.2017.8203875
M3 - Conference contribution
AN - SCOPUS:85043515848
T3 - IEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers, ICCAD
SP - 894
EP - 901
BT - 2017 IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2017
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 36th IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2017
Y2 - 13 November 2017 through 16 November 2017
ER -