Towards an Accurate Latency Model for Convolutional Neural Network Layers on GPUs

Jinyang Li, Runyu Ma, Vikram Sharma Mailthody, Colin Samplawski, Benjamin Marlin, Songqing Chen, Shuochao Yao, Tarek Abdelzaher

Research output: Chapter in Book/Report/Conference proceedingConference contribution


Convolutional Neural Networks (CNN) have shown great success in many sensing and recognition applications. However, the excessive resource demand remains a major barrier against their deployment on low-end devices. Optimizations, such as model compression, are thus a need for practical deployment. To fully exploit existing system resources, platform-aware optimizations emerged in recent years, where an execution-time model becomes a necessity. However, non-monotonicity over the network configuration space makes execution time modeling a challenging task. Data-driven approaches have the advantage of being portable over different platforms by treating the hardware and software stack as a black box but at the cost of extremely long profiling time. On the other hand, analytical models can be found in the architecture and system literature that do not need heavy profiling but require laborious analysis by domain experts. In this paper, we focus on building a general latency model for convolutional layers that account for the majority of the total execution time in CNN models. We identify two major non-linear modes in the relationship between latency and convolution parameters, and analyze the mechanism behind them. The resulting model has better interpretability and can reduce profiling workload. The evaluation results show that our model outperforms baselines on different platforms and CNN models.

Original languageEnglish (US)
Title of host publicationMILCOM 2021 - 2021 IEEE Military Communications Conference
PublisherInstitute of Electrical and Electronics Engineers Inc.
Number of pages6
ISBN (Electronic)9781665439565
StatePublished - 2021
Event2021 IEEE Military Communications Conference, MILCOM 2021 - San Diego, United States
Duration: Nov 29 2021Dec 2 2021

Publication series

NameProceedings - IEEE Military Communications Conference MILCOM


Conference2021 IEEE Military Communications Conference, MILCOM 2021
Country/TerritoryUnited States
CitySan Diego

ASJC Scopus subject areas

  • Electrical and Electronic Engineering


Dive into the research topics of 'Towards an Accurate Latency Model for Convolutional Neural Network Layers on GPUs'. Together they form a unique fingerprint.

Cite this