TY - GEN
T1 - AutoDNNchip
T2 - 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, FPGA 2020
AU - Xu, Pengfei
AU - Zhang, Xiaofan
AU - Hao, Cong
AU - Zhao, Yang
AU - Zhang, Yongan
AU - Wang, Yue
AU - Li, Chaojian
AU - Guan, Zetong
AU - Chen, Deming
AU - Lin, Yingyan
N1 - Publisher Copyright:
© 2020 Association for Computing Machinery.
PY - 2020/2/23
Y1 - 2020/2/23
N2 - Recent breakthroughs in Deep Neural Networks (DNNs) have fueled a growing demand for domain-specific hardware accelerators (i.e., DNN chips). However, designing DNN chips is non-trivial because: (1) mainstream DNNs have millions of parameters and operations; (2) the design space is large due to the numerous design choices of dataflows, processing elements, memory hierarchy, etc.; and (3) an algorithm/hardware co-design is needed to allow the same DNN functionality to have a different decomposition that would require different hardware IPs that correspond to dramatically different performance/energy/area tradeoffs. Therefore, DNN chips often take months to years to design and require a large team of cross-disciplinary experts. To enable fast and effective DNN chip design, we propose AutoDNNchip - a DNN chip generator that can automatically generate both FPGA- and ASIC-based DNN chip implementation (i.e., synthesizable RTL code with optimized algorithm-to-hardware mapping (i.e., dataflow)) given DNNs from machine learning frameworks (e.g., PyTorch) for a designated application and dataset without humans in the loop. Specifically, AutoDNNchip consists of two integrated enablers: (1) a Chip Predictor, built on top of a graph-based accelerator representation, which can accurately and efficiently predict a DNN accelerator's energy, throughput, latency, and area based on the DNN model parameters, hardware configuration, technology-based IPs, and platform constraints; and (2) a Chip Builder, which can automatically explore the design space of DNN chips (including IP selection, block configuration, resource balance, etc.), optimize chip design via the Chip Predictor, and then generate synthesizable RTL code with optimized dataflows to achieve the target design metrics. Experimental results show that our Chip Predictor's predicted performance differs from the real-measured one by <10% when validated using 15 DNN models and 4 platforms (edge-FPGA/TPU/GPU and ASIC). Furthermore, both the FPGA- and ASIC-based DNN accelerators generated by our AutoDNNchip can achieve better (up to 3.86× improvement) performance than that of expert-crafted state-of-the-art accelerators, showing the effectiveness of AutoDNNchip. Our open-source code can be found at https://github.com/RICE-EIC/AutoDNNchip.git.
AB - Recent breakthroughs in Deep Neural Networks (DNNs) have fueled a growing demand for domain-specific hardware accelerators (i.e., DNN chips). However, designing DNN chips is non-trivial because: (1) mainstream DNNs have millions of parameters and operations; (2) the design space is large due to the numerous design choices of dataflows, processing elements, memory hierarchy, etc.; and (3) an algorithm/hardware co-design is needed to allow the same DNN functionality to have a different decomposition that would require different hardware IPs that correspond to dramatically different performance/energy/area tradeoffs. Therefore, DNN chips often take months to years to design and require a large team of cross-disciplinary experts. To enable fast and effective DNN chip design, we propose AutoDNNchip - a DNN chip generator that can automatically generate both FPGA- and ASIC-based DNN chip implementation (i.e., synthesizable RTL code with optimized algorithm-to-hardware mapping (i.e., dataflow)) given DNNs from machine learning frameworks (e.g., PyTorch) for a designated application and dataset without humans in the loop. Specifically, AutoDNNchip consists of two integrated enablers: (1) a Chip Predictor, built on top of a graph-based accelerator representation, which can accurately and efficiently predict a DNN accelerator's energy, throughput, latency, and area based on the DNN model parameters, hardware configuration, technology-based IPs, and platform constraints; and (2) a Chip Builder, which can automatically explore the design space of DNN chips (including IP selection, block configuration, resource balance, etc.), optimize chip design via the Chip Predictor, and then generate synthesizable RTL code with optimized dataflows to achieve the target design metrics. Experimental results show that our Chip Predictor's predicted performance differs from the real-measured one by <10% when validated using 15 DNN models and 4 platforms (edge-FPGA/TPU/GPU and ASIC). Furthermore, both the FPGA- and ASIC-based DNN accelerators generated by our AutoDNNchip can achieve better (up to 3.86× improvement) performance than that of expert-crafted state-of-the-art accelerators, showing the effectiveness of AutoDNNchip. Our open-source code can be found at https://github.com/RICE-EIC/AutoDNNchip.git.
UR - http://www.scopus.com/inward/record.url?scp=85082071251&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85082071251&partnerID=8YFLogxK
U2 - 10.1145/3373087.3375306
DO - 10.1145/3373087.3375306
M3 - Conference contribution
AN - SCOPUS:85082071251
T3 - FPGA 2020 - 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
SP - 40
EP - 50
BT - FPGA 2020 - 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
PB - Association for Computing Machinery
Y2 - 23 February 2020 through 25 February 2020
ER -