Abstract
FPGAs have been rapidly adopted for acceleration of Deep Neural Networks (DNNs) with improved latency and energy efficiency compared to CPU and GPU-based implementations. High-level synthesis (HLS) is an effective design flow for DNNs due to improved productivity, debugging, and design space exploration ability. However, optimizing large neural networks under resource constraints for FPGAs is still a key challenge. In this paper, we present a series of effective design techniques for implementing DNNs on FPGAs with high performance and energy efficiency. These include the use of configurable DNN IPs, performance and resource modeling, resource allocation across DNN layers, and DNN reduction and re-training. We showcase several design solutions including Long-term Recurrent Convolution Network (LRCN) for video captioning, Inception module for FaceNet face recognition, as well as Long Short-Term Memory (LSTM) for sound recognition. These and other similar DNN solutions are ideal implementations to be deployed in vision or sound based IoT applications.
Original language | English (US) |
---|---|
Title of host publication | 2017 IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2017 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 894-901 |
Number of pages | 8 |
ISBN (Electronic) | 9781538630938 |
DOIs | |
State | Published - Dec 13 2017 |
Event | 36th IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2017 - Irvine, United States Duration: Nov 13 2017 → Nov 16 2017 |
Publication series
Name | IEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers, ICCAD |
---|---|
Volume | 2017-November |
ISSN (Print) | 1092-3152 |
Other
Other | 36th IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2017 |
---|---|
Country | United States |
City | Irvine |
Period | 11/13/17 → 11/16/17 |
Fingerprint
Keywords
- FPGAs
- Internet of Things
- Machine Learning
ASJC Scopus subject areas
- Software
- Computer Science Applications
- Computer Graphics and Computer-Aided Design
Cite this
Machine learning on FPGAs to face the IoT revolution. / Zhang, Xiaofan; Ramachandran, Anand; Zhuge, Chuanhao; He, Di; Zuo, Wei; Cheng, Zuofu; Rupnow, Kyle; Chen, Deming.
2017 IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2017. Institute of Electrical and Electronics Engineers Inc., 2017. p. 894-901 (IEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers, ICCAD; Vol. 2017-November).Research output: Chapter in Book/Report/Conference proceeding › Conference contribution
}
TY - GEN
T1 - Machine learning on FPGAs to face the IoT revolution
AU - Zhang, Xiaofan
AU - Ramachandran, Anand
AU - Zhuge, Chuanhao
AU - He, Di
AU - Zuo, Wei
AU - Cheng, Zuofu
AU - Rupnow, Kyle
AU - Chen, Deming
PY - 2017/12/13
Y1 - 2017/12/13
N2 - FPGAs have been rapidly adopted for acceleration of Deep Neural Networks (DNNs) with improved latency and energy efficiency compared to CPU and GPU-based implementations. High-level synthesis (HLS) is an effective design flow for DNNs due to improved productivity, debugging, and design space exploration ability. However, optimizing large neural networks under resource constraints for FPGAs is still a key challenge. In this paper, we present a series of effective design techniques for implementing DNNs on FPGAs with high performance and energy efficiency. These include the use of configurable DNN IPs, performance and resource modeling, resource allocation across DNN layers, and DNN reduction and re-training. We showcase several design solutions including Long-term Recurrent Convolution Network (LRCN) for video captioning, Inception module for FaceNet face recognition, as well as Long Short-Term Memory (LSTM) for sound recognition. These and other similar DNN solutions are ideal implementations to be deployed in vision or sound based IoT applications.
AB - FPGAs have been rapidly adopted for acceleration of Deep Neural Networks (DNNs) with improved latency and energy efficiency compared to CPU and GPU-based implementations. High-level synthesis (HLS) is an effective design flow for DNNs due to improved productivity, debugging, and design space exploration ability. However, optimizing large neural networks under resource constraints for FPGAs is still a key challenge. In this paper, we present a series of effective design techniques for implementing DNNs on FPGAs with high performance and energy efficiency. These include the use of configurable DNN IPs, performance and resource modeling, resource allocation across DNN layers, and DNN reduction and re-training. We showcase several design solutions including Long-term Recurrent Convolution Network (LRCN) for video captioning, Inception module for FaceNet face recognition, as well as Long Short-Term Memory (LSTM) for sound recognition. These and other similar DNN solutions are ideal implementations to be deployed in vision or sound based IoT applications.
KW - FPGAs
KW - Internet of Things
KW - Machine Learning
UR - http://www.scopus.com/inward/record.url?scp=85043515848&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85043515848&partnerID=8YFLogxK
U2 - 10.1109/ICCAD.2017.8203875
DO - 10.1109/ICCAD.2017.8203875
M3 - Conference contribution
AN - SCOPUS:85043515848
T3 - IEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers, ICCAD
SP - 894
EP - 901
BT - 2017 IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2017
PB - Institute of Electrical and Electronics Engineers Inc.
ER -