TY - JOUR
T1 - DNNExplorer
T2 - 39th IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2020
AU - Zhang, Xiaofan
AU - Ye, Hanchen
AU - Wang, Junsong
AU - Lin, Yonghua
AU - Xiong, Jinjun
AU - Hwu, Wen Mei
AU - Chen, Deming
N1 - Funding Information:
This work was supported in part by the IBM-Illinois Center for Cognitive Computing System Research (C3SR) – a research collaboration as part of IBM AI Horizons Network. Xiaofan Zhang is supported by a Google PhD Fellowship.
Publisher Copyright:
© 2020 Association on Computer Machinery.
PY - 2020/11/2
Y1 - 2020/11/2
N2 - Existing FPGA-based DNN accelerators typically fall into two design paradigms. Either they adopt a generic reusable architecture to support different DNN networks but leave some performance and efficiency on the table because of the sacrifice of design specificity. Or they apply a layer-wise tailor-made architecture to optimize layer-specific demands for computation and resources but loose the scalability of adaptation to a wide range of DNN networks. To overcome these drawbacks, this paper proposes a novel FPGA-based DNN accelerator design paradigm and its automation tool, called DNNExplorer, to enable fast exploration of various accelerator designs under the proposed paradigm and deliver optimized accelerator architectures for existing and emerging DNN networks. Three key techniques are essential for DNNExplorer's improved performance, better specificity, and scalability, including (1) a unique accelerator design paradigm with both high-dimensional design space support and fine-grained adjustability, (2) a dynamic design space to accommodate different combinations of DNN workloads and targeted FPGAs, and (3) a design space exploration (DSE) engine to generate optimized accelerator architectures following the proposed paradigm by simultaneously considering both FPGAs' computation and memory resources and DNN networks' layer-wise characteristics and overall complexity. Experimental results show that, for the same FPGAs, accelerators generated by DNNExplorer can deliver up to 4.2x higher performances (GOP/s) than the state-of-the-art layer-wise pipelined solutions generated by DNNBuilder [1] for VGG-like DNN with 38 CONV layers. Compared to accelerators with generic reusable computation units, DNNExplorer achieves up to 2.0x and 4.4x DSP efficiency improvement than a recently published accelerator design from academia (HybridDNN [2]) and a commercial DNN accelerator IP (Xilinx DPU [3]), respectively.
AB - Existing FPGA-based DNN accelerators typically fall into two design paradigms. Either they adopt a generic reusable architecture to support different DNN networks but leave some performance and efficiency on the table because of the sacrifice of design specificity. Or they apply a layer-wise tailor-made architecture to optimize layer-specific demands for computation and resources but loose the scalability of adaptation to a wide range of DNN networks. To overcome these drawbacks, this paper proposes a novel FPGA-based DNN accelerator design paradigm and its automation tool, called DNNExplorer, to enable fast exploration of various accelerator designs under the proposed paradigm and deliver optimized accelerator architectures for existing and emerging DNN networks. Three key techniques are essential for DNNExplorer's improved performance, better specificity, and scalability, including (1) a unique accelerator design paradigm with both high-dimensional design space support and fine-grained adjustability, (2) a dynamic design space to accommodate different combinations of DNN workloads and targeted FPGAs, and (3) a design space exploration (DSE) engine to generate optimized accelerator architectures following the proposed paradigm by simultaneously considering both FPGAs' computation and memory resources and DNN networks' layer-wise characteristics and overall complexity. Experimental results show that, for the same FPGAs, accelerators generated by DNNExplorer can deliver up to 4.2x higher performances (GOP/s) than the state-of-the-art layer-wise pipelined solutions generated by DNNBuilder [1] for VGG-like DNN with 38 CONV layers. Compared to accelerators with generic reusable computation units, DNNExplorer achieves up to 2.0x and 4.4x DSP efficiency improvement than a recently published accelerator design from academia (HybridDNN [2]) and a commercial DNN accelerator IP (Xilinx DPU [3]), respectively.
UR - http://www.scopus.com/inward/record.url?scp=85097935130&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85097935130&partnerID=8YFLogxK
U2 - 10.1145/3400302.3415609
DO - 10.1145/3400302.3415609
M3 - Conference article
AN - SCOPUS:85097935130
SN - 1092-3152
VL - 2020-November
JO - IEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers
JF - IEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers
M1 - 9256813
Y2 - 2 November 2020 through 5 November 2020
ER -