Save: Sparsity-aware vector engine for accelerating DNN training and inference on CPUs

Zhangxiaowen Gong, Houxiang Ji, Christopher W. Fletcher, Christopher J. Hughes, Sara Baghsorkhi, Josep Torrellas

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

General Matrix Multiplication (GEMM) is the key operation in Deep Neural Networks (DNNs). While dense GEMM uses SIMD CPUs efficiently, sparse GEMM is much less efficient, especially at the modest levels of unstructured sparsity common in DNN inference/training. Thus, most DNNs use dense GEMM. In this paper, we propose SAVE, a novel vector engine for CPUs that efficiently skips ineffectual computation due to sparsity in dense DNN implementations. SAVE's hardware extensions to the vector pipeline are transparent to software. SAVE accelerates FP32 and mixed-precision kernels with unstructured sparsity from both weights and activations. Further, SAVE is not DNNspecific and can potentially speed-up any vector workload with sparsity. To evaluate SAVE, we use simulations of a 28-core machine and run VGG16, ResNet-50, and GNMT, with and without pruning. With realistic sparsity, SAVE accelerates inference by 1.37x-1.68x and end-to-end training by 1.28x-1.64x.

Original languageEnglish (US)
Title of host publicationProceedings - 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2020
PublisherIEEE Computer Society
Pages796-810
Number of pages15
ISBN (Electronic)9781728173832
DOIs
StatePublished - Oct 2020
Event53rd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2020 - Virtual, Athens, Greece
Duration: Oct 17 2020Oct 21 2020

Publication series

NameProceedings of the Annual International Symposium on Microarchitecture, MICRO
Volume2020-October
ISSN (Print)1072-4451

Conference

Conference53rd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2020
CountryGreece
CityVirtual, Athens
Period10/17/2010/21/20

ASJC Scopus subject areas

  • Hardware and Architecture

Fingerprint Dive into the research topics of 'Save: Sparsity-aware vector engine for accelerating DNN training and inference on CPUs'. Together they form a unique fingerprint.

Cite this