TY - JOUR
T1 - Deep In-Memory Architectures in SRAM
T2 - An Analog Approach to Approximate Computing
AU - Kang, Mingu
AU - Gonugondla, Sujan K.
AU - Shanbhag, Naresh R.
N1 - Funding Information:
Dr. Gonugondla was a recipient of the Dr. Ok Kyun Kim Fellowship 2018–2019 and the M. E. Van Valkenburg Graduate Research Award 2019–2020 from the Department of Electrical and Computer Engineering, University of Illinois at Urbana–Champaign, the ADI Outstanding Student Designer Award in 2018, and the SSCS Predoctoral Achievement Award in 2020. He received the Best Student Paper Awards at the International Conference on Acoustics, Speech and Signal Processing (ICASSP) in 2016 and the International Conference in Circuits and Systems (ISCAS) in 2018.
Funding Information:
Manuscript received January 30, 2020; revised May 21, 2020, August 16, 2020, September 13, 2020, and October 11, 2020; accepted October 21, 2020. Date of publication November 9, 2020; date of current version November 20, 2020. This work was supported by the Systems On Nanoscale Information fabriCs (SONIC) and the Center for Brain-Inspired Computing (C-BRIC) funded by the Semiconductor Research Corporation (SRC) and the Defense Advanced Research Projects Agency (DARPA). (Corresponding author: Mingu Kang.) Mingu Kang is with the Department of Electrical and Computer Engineering, University of California at San Diego (UCSD), La Jolla, CA 92093 USA (e-mail: m7kang@ucsd.edu). Sujan K. Gonugondla is with Amazon, Seattle, WA 98121 USA (e-mail: gsujan@amazon.com). Naresh R. Shanbhag is with the Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, Urbana, IL 61801 USA (e-mail: shanbhag@illinois.edu).
Publisher Copyright:
© 1963-2012 IEEE.
PY - 2020/12
Y1 - 2020/12
N2 - This article provides an overview of recently proposed deep in-memory architectures (DIMAs) in SRAM for energy- and latency-efficient hardware realization of machine learning (ML) algorithms. DIMA tackles the data movement problem in von Neumann architectures head-on by deeply embedding mixed-signal computations into a conventional memory array. In doing so, it trades off its computational signal-to-noise ratio (compute SNR) with energy and latency, and therefore, it represents an analog form of approximate computing. DIMA exploits the inherent error immunity of ML algorithms and SNR budgeting methods to operate its analog circuitry in a low-swing/low-compute SNR regime, thereby achieving > 100× reduction in the energy-delay product (EDP) over an equivalent von Neumann architecture with no loss in inference accuracy. This article describes DIMA's computational pipeline and provides a Shannon-inspired rationale for its robustness to process, temperature, and voltage variations and design guidelines to manage its analog nonidealities. DIMA's versatility, effectiveness, and practicality demonstrated via multiple silicon IC prototypes in a 65-nm CMOS process are described. A DIMA-based instruction set architecture (ISA) to realize an end-to-end application-to-architecture mapping for the accelerating diverse ML algorithms is also presented. Finally, DIMA's fundamental tradeoff between energy and accuracy in the low-compute SNR regime is analyzed to determine energy-optimum design parameters.
AB - This article provides an overview of recently proposed deep in-memory architectures (DIMAs) in SRAM for energy- and latency-efficient hardware realization of machine learning (ML) algorithms. DIMA tackles the data movement problem in von Neumann architectures head-on by deeply embedding mixed-signal computations into a conventional memory array. In doing so, it trades off its computational signal-to-noise ratio (compute SNR) with energy and latency, and therefore, it represents an analog form of approximate computing. DIMA exploits the inherent error immunity of ML algorithms and SNR budgeting methods to operate its analog circuitry in a low-swing/low-compute SNR regime, thereby achieving > 100× reduction in the energy-delay product (EDP) over an equivalent von Neumann architecture with no loss in inference accuracy. This article describes DIMA's computational pipeline and provides a Shannon-inspired rationale for its robustness to process, temperature, and voltage variations and design guidelines to manage its analog nonidealities. DIMA's versatility, effectiveness, and practicality demonstrated via multiple silicon IC prototypes in a 65-nm CMOS process are described. A DIMA-based instruction set architecture (ISA) to realize an end-to-end application-to-architecture mapping for the accelerating diverse ML algorithms is also presented. Finally, DIMA's fundamental tradeoff between energy and accuracy in the low-compute SNR regime is analyzed to determine energy-optimum design parameters.
KW - Accelerator
KW - artificial intelligence
KW - energy efficiency
KW - in-memory computing
KW - machine learning (ML)
KW - non-von Neumann
UR - http://www.scopus.com/inward/record.url?scp=85096401586&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85096401586&partnerID=8YFLogxK
U2 - 10.1109/JPROC.2020.3034117
DO - 10.1109/JPROC.2020.3034117
M3 - Article
AN - SCOPUS:85096401586
SN - 0018-9219
VL - 108
SP - 2251
EP - 2275
JO - Proceedings of the Institute of Radio Engineers
JF - Proceedings of the Institute of Radio Engineers
IS - 12
M1 - 9252843
ER -