TY - GEN
T1 - EDD
T2 - 57th ACM/IEEE Design Automation Conference, DAC 2020
AU - Li, Yuhong
AU - Hao, Cong
AU - Zhang, Xiaofan
AU - Liu, Xinheng
AU - Chen, Yao
AU - Xiong, Jinjun
AU - Hwu, Wen Mei
AU - Chen, Deming
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/7
Y1 - 2020/7
N2 - High quality AI solutions require joint optimization of AI algorithms and their hardware implementations. In this work, we are the first to propose a fully simultaneous, Efficient Differentiable DNN (deep neural network) architecture and implementation co-search (EDD) methodology. We formulate the co-search problem by fusing DNN search variables and hardware implementation variables into one solution space, and maximize both algorithm accuracy and hardware implementation quality. The formulation is differentiable with respect to the fused variables, so that gradient descent algorithm can be applied to greatly reduce the search time. The formulation is also applicable for various devices with different objectives. In the experiments, we demonstrate the effectiveness of our EDD methodology by searching for three representative DNNs, targeting low-latency GPU implementation and FPGA implementations with both recursive and pipelined architectures. Each model produced by EDD achieves similar accuracy as the best existing DNN models searched by neural architecture search (NAS) methods on ImageNet, but with superior performance obtained within 12 GPU-hour searches. Our DNN targeting GPU is 1.40× faster than the state-of-the-art solution reported in Proxyless [1], and our DNN targeting FPGA delivers 1.45× higher throughput than the state-of-the-art solution reported in DNNBuilder [2].
AB - High quality AI solutions require joint optimization of AI algorithms and their hardware implementations. In this work, we are the first to propose a fully simultaneous, Efficient Differentiable DNN (deep neural network) architecture and implementation co-search (EDD) methodology. We formulate the co-search problem by fusing DNN search variables and hardware implementation variables into one solution space, and maximize both algorithm accuracy and hardware implementation quality. The formulation is differentiable with respect to the fused variables, so that gradient descent algorithm can be applied to greatly reduce the search time. The formulation is also applicable for various devices with different objectives. In the experiments, we demonstrate the effectiveness of our EDD methodology by searching for three representative DNNs, targeting low-latency GPU implementation and FPGA implementations with both recursive and pipelined architectures. Each model produced by EDD achieves similar accuracy as the best existing DNN models searched by neural architecture search (NAS) methods on ImageNet, but with superior performance obtained within 12 GPU-hour searches. Our DNN targeting GPU is 1.40× faster than the state-of-the-art solution reported in Proxyless [1], and our DNN targeting FPGA delivers 1.45× higher throughput than the state-of-the-art solution reported in DNNBuilder [2].
UR - http://www.scopus.com/inward/record.url?scp=85091300706&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85091300706&partnerID=8YFLogxK
U2 - 10.1109/DAC18072.2020.9218749
DO - 10.1109/DAC18072.2020.9218749
M3 - Conference contribution
AN - SCOPUS:85091300706
T3 - Proceedings - Design Automation Conference
BT - 2020 57th ACM/IEEE Design Automation Conference, DAC 2020
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 20 July 2020 through 24 July 2020
ER -