TY - JOUR
T1 - Mobiprox
T2 - Supporting Dynamic Approximate Computing on Mobiles
AU - Fabjancic, Matevz
AU - Machidon, Octavian
AU - Sharif, Hashim
AU - Zhao, Yifan
AU - Misailovic, Sasa
AU - Pejovic, Veljko
N1 - This work was supported in part by the Slovenian Research Agency through the Bringing Resource Efficiency to Smartphones With Approximate Computing Project under Grant N2-0136; in part by the Context-Aware On-Device Approximate Computing project under Grant J2- 3047; and in part by the Research Programmes under Grant P2-0098 and Grant P2-0426.
PY - 2024/5/1
Y1 - 2024/5/1
N2 - Runtime-tunable context-dependent network compression would make mobile deep learning (DL) adaptable to often varying resource availability, input 'difficulty,' or user needs. The existing compression techniques significantly reduce the memory, processing, and energy tax of DL, yet, the resulting models tend to be permanently impaired, sacrificing the inference power for reduced resource usage. The existing tunable compression approaches, on the other hand, require expensive retraining, do not support arbitrary strategies for adapting the compression and do not provide mobile-ready implementations. In this article, we present Mobiprox, a framework enabling mobile DL with flexible precision. Mobiprox implements tunable approximations of tensor operations and enables runtime-adaptable approximation of individual network layers. A profiler and a tuner included with Mobiprox identify the most promising neural network approximation configurations leading to the desired inference quality with the minimal use of resources. Furthermore, we develop control strategies that depending on contextual factors, such as the input data difficulty, dynamically adjust the approximation levels across a mobile DL model's layers. We implement Mobiprox in Android OS and through experiments in diverse mobile domains, including human activity recognition and spoken keyword detection, demonstrate that it can save up to 15% system-wide energy with a minimal impact on the inference accuracy.
AB - Runtime-tunable context-dependent network compression would make mobile deep learning (DL) adaptable to often varying resource availability, input 'difficulty,' or user needs. The existing compression techniques significantly reduce the memory, processing, and energy tax of DL, yet, the resulting models tend to be permanently impaired, sacrificing the inference power for reduced resource usage. The existing tunable compression approaches, on the other hand, require expensive retraining, do not support arbitrary strategies for adapting the compression and do not provide mobile-ready implementations. In this article, we present Mobiprox, a framework enabling mobile DL with flexible precision. Mobiprox implements tunable approximations of tensor operations and enables runtime-adaptable approximation of individual network layers. A profiler and a tuner included with Mobiprox identify the most promising neural network approximation configurations leading to the desired inference quality with the minimal use of resources. Furthermore, we develop control strategies that depending on contextual factors, such as the input data difficulty, dynamically adjust the approximation levels across a mobile DL model's layers. We implement Mobiprox in Android OS and through experiments in diverse mobile domains, including human activity recognition and spoken keyword detection, demonstrate that it can save up to 15% system-wide energy with a minimal impact on the inference accuracy.
KW - Adaptation models
KW - approximate computing
KW - Computational modeling
KW - context-awareness
KW - Deep learning
KW - Hardware
KW - mobile deep learning
KW - Quantization (signal)
KW - Runtime
KW - Tensors
KW - ubiquitous computing
KW - Approximate computing
KW - mobile deep learning (DL)
UR - http://www.scopus.com/inward/record.url?scp=85187277075&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85187277075&partnerID=8YFLogxK
U2 - 10.1109/JIOT.2024.3365957
DO - 10.1109/JIOT.2024.3365957
M3 - Article
AN - SCOPUS:85187277075
SN - 2327-4662
VL - 11
SP - 16873
EP - 16886
JO - IEEE Internet of Things Journal
JF - IEEE Internet of Things Journal
IS - 9
ER -