TY - GEN
T1 - Save
T2 - 53rd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2020
AU - Gong, Zhangxiaowen
AU - Ji, Houxiang
AU - Fletcher, Christopher W.
AU - Hughes, Christopher J.
AU - Baghsorkhi, Sara
AU - Torrellas, Josep
N1 - Funding Information:
This work was funded in part by NSF under grants CNS 1956007, CNS 1763658, and CCF 1725734.
Publisher Copyright:
© 2020 IEEE Computer Society. All rights reserved.
PY - 2020/10
Y1 - 2020/10
N2 - General Matrix Multiplication (GEMM) is the key operation in Deep Neural Networks (DNNs). While dense GEMM uses SIMD CPUs efficiently, sparse GEMM is much less efficient, especially at the modest levels of unstructured sparsity common in DNN inference/training. Thus, most DNNs use dense GEMM. In this paper, we propose SAVE, a novel vector engine for CPUs that efficiently skips ineffectual computation due to sparsity in dense DNN implementations. SAVE's hardware extensions to the vector pipeline are transparent to software. SAVE accelerates FP32 and mixed-precision kernels with unstructured sparsity from both weights and activations. Further, SAVE is not DNNspecific and can potentially speed-up any vector workload with sparsity. To evaluate SAVE, we use simulations of a 28-core machine and run VGG16, ResNet-50, and GNMT, with and without pruning. With realistic sparsity, SAVE accelerates inference by 1.37x-1.68x and end-to-end training by 1.28x-1.64x.
AB - General Matrix Multiplication (GEMM) is the key operation in Deep Neural Networks (DNNs). While dense GEMM uses SIMD CPUs efficiently, sparse GEMM is much less efficient, especially at the modest levels of unstructured sparsity common in DNN inference/training. Thus, most DNNs use dense GEMM. In this paper, we propose SAVE, a novel vector engine for CPUs that efficiently skips ineffectual computation due to sparsity in dense DNN implementations. SAVE's hardware extensions to the vector pipeline are transparent to software. SAVE accelerates FP32 and mixed-precision kernels with unstructured sparsity from both weights and activations. Further, SAVE is not DNNspecific and can potentially speed-up any vector workload with sparsity. To evaluate SAVE, we use simulations of a 28-core machine and run VGG16, ResNet-50, and GNMT, with and without pruning. With realistic sparsity, SAVE accelerates inference by 1.37x-1.68x and end-to-end training by 1.28x-1.64x.
UR - http://www.scopus.com/inward/record.url?scp=85094219441&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85094219441&partnerID=8YFLogxK
U2 - 10.1109/MICRO50266.2020.00070
DO - 10.1109/MICRO50266.2020.00070
M3 - Conference contribution
AN - SCOPUS:85094219441
T3 - Proceedings of the Annual International Symposium on Microarchitecture, MICRO
SP - 796
EP - 810
BT - Proceedings - 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2020
PB - IEEE Computer Society
Y2 - 17 October 2020 through 21 October 2020
ER -