Mixed Precision Quantization for ReRAM-based DNN Inference Accelerators

Sitao Huang, Aayush Ankit, Plinio Silveira, Rodrigo Antunes, Sai Rahul Chalamalasetti, Izzat El Hajj, Dong Eun Kim, Glaucimar Aguiar, Pedro Bruel, Sergey Serebryakov, Cong Xu, Can Li, Paolo Faraboschi, John Paul Strachan, Deming Chen, Kaushik Roy, Wen Mei Hwu, Dejan Milojicic

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

ReRAM-based accelerators have shown great potential for accelerating DNN inference because ReRAM crossbars can perform analog matrix-vector multiplication operations with low latency and energy consumption. However, these crossbars require the use of ADCs which constitute a significant fraction of the cost of MVM operations. The overhead of ADCs can be mitigated via partial sum quantization. However, prior quantization flows for DNN inference accelerators do not consider partial sum quantization which is not highly relevant to traditional digital architectures. To address this issue, we propose a mixed precision quantization scheme for ReRAM-based DNN inference accelerators where weight quantization, input quantization, and partial sum quantization are jointly applied for each DNN layer. We also propose an automated quantization flow powered by deep reinforcement learning to search for the best quantization configuration in the large design space. Our evaluation shows that the proposed mixed precision quantization scheme and quantization flow reduce inference latency and energy consumption by up to 3.89 and 4.84, respectively, while only losing 1.18% in DNN inference accuracy.

Original languageEnglish (US)
Title of host publicationProceedings of the 26th Asia and South Pacific Design Automation Conference, ASP-DAC 2021
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages372-377
Number of pages6
ISBN (Electronic)9781450379991
DOIs
StatePublished - Jan 18 2021
Event26th Asia and South Pacific Design Automation Conference, ASP-DAC 2021 - Virtual, Online, Japan
Duration: Jan 18 2021Jan 21 2021

Publication series

NameProceedings of the Asia and South Pacific Design Automation Conference, ASP-DAC

Conference

Conference26th Asia and South Pacific Design Automation Conference, ASP-DAC 2021
Country/TerritoryJapan
CityVirtual, Online
Period1/18/211/21/21

Keywords

  • DNN inference accelerators
  • Mixed precision quantization
  • ReRAM

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Computer Science Applications
  • Computer Graphics and Computer-Aided Design

Fingerprint

Dive into the research topics of 'Mixed Precision Quantization for ReRAM-based DNN Inference Accelerators'. Together they form a unique fingerprint.

Cite this