TY - GEN
T1 - Towards an Accurate Latency Model for Convolutional Neural Network Layers on GPUs
AU - Li, Jinyang
AU - Ma, Runyu
AU - Mailthody, Vikram Sharma
AU - Samplawski, Colin
AU - Marlin, Benjamin
AU - Chen, Songqing
AU - Yao, Shuochao
AU - Abdelzaher, Tarek
N1 - Funding Information:
Research reported in this paper was sponsored in part by the Army Research Laboratory under Cooperative Agreement W911NF-17-2-0196 and NSF under award CPS 20-38817, and in part by The Boeing Company. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Army Research Laboratory, NSF, Boeing, or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation here on.
Publisher Copyright:
© 2021 IEEE.
PY - 2021
Y1 - 2021
N2 - Convolutional Neural Networks (CNN) have shown great success in many sensing and recognition applications. However, the excessive resource demand remains a major barrier against their deployment on low-end devices. Optimizations, such as model compression, are thus a need for practical deployment. To fully exploit existing system resources, platform-aware optimizations emerged in recent years, where an execution-time model becomes a necessity. However, non-monotonicity over the network configuration space makes execution time modeling a challenging task. Data-driven approaches have the advantage of being portable over different platforms by treating the hardware and software stack as a black box but at the cost of extremely long profiling time. On the other hand, analytical models can be found in the architecture and system literature that do not need heavy profiling but require laborious analysis by domain experts. In this paper, we focus on building a general latency model for convolutional layers that account for the majority of the total execution time in CNN models. We identify two major non-linear modes in the relationship between latency and convolution parameters, and analyze the mechanism behind them. The resulting model has better interpretability and can reduce profiling workload. The evaluation results show that our model outperforms baselines on different platforms and CNN models.
AB - Convolutional Neural Networks (CNN) have shown great success in many sensing and recognition applications. However, the excessive resource demand remains a major barrier against their deployment on low-end devices. Optimizations, such as model compression, are thus a need for practical deployment. To fully exploit existing system resources, platform-aware optimizations emerged in recent years, where an execution-time model becomes a necessity. However, non-monotonicity over the network configuration space makes execution time modeling a challenging task. Data-driven approaches have the advantage of being portable over different platforms by treating the hardware and software stack as a black box but at the cost of extremely long profiling time. On the other hand, analytical models can be found in the architecture and system literature that do not need heavy profiling but require laborious analysis by domain experts. In this paper, we focus on building a general latency model for convolutional layers that account for the majority of the total execution time in CNN models. We identify two major non-linear modes in the relationship between latency and convolution parameters, and analyze the mechanism behind them. The resulting model has better interpretability and can reduce profiling workload. The evaluation results show that our model outperforms baselines on different platforms and CNN models.
UR - http://www.scopus.com/inward/record.url?scp=85124134833&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85124134833&partnerID=8YFLogxK
U2 - 10.1109/MILCOM52596.2021.9652907
DO - 10.1109/MILCOM52596.2021.9652907
M3 - Conference contribution
AN - SCOPUS:85124134833
T3 - Proceedings - IEEE Military Communications Conference MILCOM
SP - 904
EP - 909
BT - MILCOM 2021 - 2021 IEEE Military Communications Conference
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2021 IEEE Military Communications Conference, MILCOM 2021
Y2 - 29 November 2021 through 2 December 2021
ER -