TY - GEN
T1 - ApproxTuner
T2 - 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2021
AU - Sharif, Hashim
AU - Zhao, Yifan
AU - Kotsifakou, Maria
AU - Kothari, Akash
AU - Schreiber, Ben
AU - Wang, Elizabeth
AU - Sarita, Yasmin
AU - Zhao, Nathan
AU - Joshi, Keyur
AU - Adve, Vikram S.
AU - Misailovic, Sasa
AU - Adve, Sarita
N1 - Funding Information:
Petabricks uses heuristic search to select among multiple user-provided versions of an algorithm with varying accuracy [4, 6, 16]. It requires developers to rewrite programs in the Petabricks programming language to provide alternate algorithm implementations, does not offer built-in, generic approximation options such as reduced precision, and does not support run-time approximation choices. Dynamic Approximation Tuning: Many systems have supported approximation changes at run-time on CPU [9, 26, 27] and GPU [23, 54, 55]. These systems 1) do not target heterogeneous systems beyond GPUs, and 2) do not support efficient tuning for combining multiple kinds of approximations. Model Optimizations for DNNs. Deep Compression [21] uses pruning, quantization, and compression to reduce model size (as much as 49x for VGG-16). However, pruning introduces sparsity in the computation which limits performance gains, and can even lead to slowdowns on GPUs [44, 61, 65]. To make sparse tensor computation efficient, researchers proposed software and architectural techniques [20, 25, 38, 44, 51, 64]. ApproxTuner does not reduce model size, but optimizes various tensor operator approximations, yielding significant speedups. Our preliminary study (Section 8) shows that there is potential for combining perforation and sampling with pruned models to have both model size and performance improvements. Our work is also complementary to systems that automatically generate implementations for low-precision quantized tensor computations [15, 19]. 10 Conclusion We proposed ApproxTuner, a compiler and runtime system that uses a 3-phase tuning approach including development-time, install-time, and runtime tuning. ApproxTuner uses performance and accuracy prediction heuristics to tune the program at development-time and generates a tradeoff curve, it refines this tradeoff curve with performance measurements and hardware-specific approximations at install-time, and uses this tradeoff curve at runtime to switch configurations efficiently in response to changing runtime conditions. Across 11 benchmarks, ApproxTuner delivers a geometric mean performance improvement of 2.1x on the GPU, and 1.3x on the CPU, with only 1 percentage point drop in accuracy. Dynamic tuning capabilities allow ApproxTuner to adapt application performance to changing run-time conditions. Overall, ApproxTuner provides a generic approximation-tuning framework that is extensible to a wide range of software and hardware approximations, for important application domains such as neural networks and image processing. Our future work includes extending ApproxTuner to other domains and applying it with an even broader of algorithmic optimizations. Acknowledgements This work is supported in part by DARPA through the Domain-Specific System on Chip (DSSoC) program, the National Science Foundation under Grants CCF 17-03637, CCF 18-46354, and CCF 19-56374, a Google Faculty Research award, a grant from the Amazon AWS Machine Learning Research Awards Program, and by the Applications Driving Architectures (ADA) Research Center, a JUMP Center co-sponsored by SRC and DARPA. References [1] 2020. Coral. https://coral.ai/. [2] 2020. Qualcomm Redefines Premium with the Flagship Snapdragon 888 5G Mobile Platform. https://www.qualcomm.com/news/releases/ 2020/12/02/qualcomm-redefines-premium-flagship-snapdragon-888-5g-mobile-platform.
Publisher Copyright:
© 2021 ACM.
Copyright:
Copyright 2021 Elsevier B.V., All rights reserved.
PY - 2021/2/17
Y1 - 2021/2/17
N2 - Manually optimizing the tradeoffs between accuracy, performance and energy for resource-intensive applications with flexible accuracy or precision requirements is extremely difficult. We present ApproxTuner, an automatic framework for accuracy-aware optimization of tensor-based applications while requiring only high-level end-to-end quality specifications. ApproxTuner implements and manages approximations in algorithms, system software, and hardware. The key contribution in ApproxTuner is a novel three-phase approach to approximation-tuning that consists of development-time, install-time, and run-time phases. Our approach decouples tuning of hardware-independent and hardware-specific approximations, thus providing retargetability across devices. To enable efficient autotuning of approximation choices, we present a novel accuracy-aware tuning technique called predictive approximation-tuning, which significantly speeds up autotuning by analytically predicting the accuracy impacts of approximations. We evaluate ApproxTuner across 10 convolutional neural networks (CNNs) and a combined CNN and image processing benchmark. For the evaluated CNNs, using only hardware-independent approximation choices we achieve a mean speedup of 2.1x (max 2.7x) on a GPU, and 1.3x mean speedup (max 1.9x) on the CPU, while staying within 1 percentage point of inference accuracy loss. For two different accuracy-prediction models, ApproxTuner speeds up tuning by 12.8x and 20.4x compared to conventional empirical tuning while achieving comparable benefits.
AB - Manually optimizing the tradeoffs between accuracy, performance and energy for resource-intensive applications with flexible accuracy or precision requirements is extremely difficult. We present ApproxTuner, an automatic framework for accuracy-aware optimization of tensor-based applications while requiring only high-level end-to-end quality specifications. ApproxTuner implements and manages approximations in algorithms, system software, and hardware. The key contribution in ApproxTuner is a novel three-phase approach to approximation-tuning that consists of development-time, install-time, and run-time phases. Our approach decouples tuning of hardware-independent and hardware-specific approximations, thus providing retargetability across devices. To enable efficient autotuning of approximation choices, we present a novel accuracy-aware tuning technique called predictive approximation-tuning, which significantly speeds up autotuning by analytically predicting the accuracy impacts of approximations. We evaluate ApproxTuner across 10 convolutional neural networks (CNNs) and a combined CNN and image processing benchmark. For the evaluated CNNs, using only hardware-independent approximation choices we achieve a mean speedup of 2.1x (max 2.7x) on a GPU, and 1.3x mean speedup (max 1.9x) on the CPU, while staying within 1 percentage point of inference accuracy loss. For two different accuracy-prediction models, ApproxTuner speeds up tuning by 12.8x and 20.4x compared to conventional empirical tuning while achieving comparable benefits.
KW - approximate computing
KW - compilers
KW - deep neural networks
KW - heterogeneous systems
UR - http://www.scopus.com/inward/record.url?scp=85101668320&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85101668320&partnerID=8YFLogxK
U2 - 10.1145/3437801.3446108
DO - 10.1145/3437801.3446108
M3 - Conference contribution
AN - SCOPUS:85101668320
T3 - Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP
SP - 262
EP - 277
BT - PPoPP 2021 - Proceedings of the 2021 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
PB - Association for Computing Machinery
Y2 - 27 February 2021 through 3 March 2021
ER -