TY - GEN
T1 - Scheduling real-time deep learning services as imprecise computations
AU - Yao, Shuochao
AU - Hao, Yifan
AU - Zhao, Yiran
AU - Shao, Huajie
AU - Liu, Dongxin
AU - Liu, Shengzhong
AU - Wang, Tianshi
AU - Li, Jinyang
AU - Abdelzaher, Tarek
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/8
Y1 - 2020/8
N2 - The paper presents a real-time computing framework for intelligent real-time edge services, on behalf of local embedded devices that are themselves unable to support extensive computations. The work contributes to a new direction in realtime computing that develops scheduling algorithms for machine intelligence tasks that enable anytime prediction. We show that deep neural network workflows can be cast as imprecise computations, each with a mandatory part and (several) optional parts whose execution utility depends on input data. With our design, deep neural networks can be preempted before their completion and support anytime inference. The goal of the realtime scheduler is to maximize the average accuracy of deep neural network outputs while meeting task deadlines, thanks to opportunistic shedding of the least necessary optional parts. The work is motivated by the proliferation of increasingly ubiquitous but resource-constrained embedded devices (for applications ranging from autonomous cars to the Internet of Things) and the desire to develop services that endow them with intelligence. Experiments on recent GPU hardware and a state of the art deep neural network for machine vision illustrate that our scheme can increase the overall accuracy by 10% ∼ 20% while incurring (nearly) no deadline misses.
AB - The paper presents a real-time computing framework for intelligent real-time edge services, on behalf of local embedded devices that are themselves unable to support extensive computations. The work contributes to a new direction in realtime computing that develops scheduling algorithms for machine intelligence tasks that enable anytime prediction. We show that deep neural network workflows can be cast as imprecise computations, each with a mandatory part and (several) optional parts whose execution utility depends on input data. With our design, deep neural networks can be preempted before their completion and support anytime inference. The goal of the realtime scheduler is to maximize the average accuracy of deep neural network outputs while meeting task deadlines, thanks to opportunistic shedding of the least necessary optional parts. The work is motivated by the proliferation of increasingly ubiquitous but resource-constrained embedded devices (for applications ranging from autonomous cars to the Internet of Things) and the desire to develop services that endow them with intelligence. Experiments on recent GPU hardware and a state of the art deep neural network for machine vision illustrate that our scheme can increase the overall accuracy by 10% ∼ 20% while incurring (nearly) no deadline misses.
UR - http://www.scopus.com/inward/record.url?scp=85092742331&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85092742331&partnerID=8YFLogxK
U2 - 10.1109/RTCSA50079.2020.9203676
DO - 10.1109/RTCSA50079.2020.9203676
M3 - Conference contribution
AN - SCOPUS:85092742331
T3 - 2020 IEEE 26th International Conference on Embedded and Real-Time Computing Systems and Applications, RTCSA 2020
BT - 2020 IEEE 26th International Conference on Embedded and Real-Time Computing Systems and Applications, RTCSA 2020
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 26th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, RTCSA 2020
Y2 - 19 August 2020 through 21 August 2020
ER -