TY - JOUR
T1 - A 0.44-J/dec, 39.9-s/dec, Recurrent Attention In-Memory Processor for Keyword Spotting
AU - Dbouk, Hassan
AU - Gonugondla, Sujan K.
AU - Sakr, Charbel
AU - Shanbhag, Naresh R.
N1 - Funding Information:
Manuscript received July 3, 2020; revised September 13, 2020; accepted October 5, 2020. Date of publication October 26, 2020; date of current version June 29, 2021. This article was approved by Associate Editor Dennis Sylvester. This work was supported in part by AFRL and DARPA under Grant FA8650-18-2-7866 as part of the FRANC Program and in part by Sandia National Laboratories. (Corresponding author: Hassan Dbouk.) Hassan Dbouk, Charbel Sakr, and Naresh R. Shanbhag are with the Coordinated Science Laboratory, University of Illinois at Urbana–Champaign, Urbana, IL 61801 USA (e-mail: [email protected]; [email protected]; [email protected]).
Publisher Copyright:
© 1966-2012 IEEE.
PY - 2021/7
Y1 - 2021/7
N2 - This article presents a deep learning-based classifier IC for keyword spotting (KWS) in 65-nm CMOS designed using an algorithm-hardware co-design approach. First, a recurrent attention model (RAM) algorithm for the KWS task (the KeyRAM algorithm) is proposed. The KeyRAM algorithm enables accuracy versus energy scalability via a confidence-based computation (CC) scheme, leading to a 2.5 reduction in computational complexity compared to state-of-the-art (SOTA) neural networks, and is well-suited for in-memory computing (IMC) since the bulk (89%) of its computations are 4-b matrix-vector multiplies. The KeyRAM IC comprises a multi-bit multi-bank IMC architecture with a digital co-processor. A sparsity-aware summation scheme is proposed to alleviate the challenge faced by IMCs when summing sparse activations. The digital co-processor employs diagonal major weight storage to compute without any stalls. This combination of the IMC and digital processors enables a balanced tradeoff between energy efficiency and high accuracy computation. The resultant KWS IC achieves SOTA decision latency of 39.9 s with a decision energy < 0.5 J /dec which translates to more than 24 savings in the energy-delay product (EDP) of decisions over existing KWS ICs.
AB - This article presents a deep learning-based classifier IC for keyword spotting (KWS) in 65-nm CMOS designed using an algorithm-hardware co-design approach. First, a recurrent attention model (RAM) algorithm for the KWS task (the KeyRAM algorithm) is proposed. The KeyRAM algorithm enables accuracy versus energy scalability via a confidence-based computation (CC) scheme, leading to a 2.5 reduction in computational complexity compared to state-of-the-art (SOTA) neural networks, and is well-suited for in-memory computing (IMC) since the bulk (89%) of its computations are 4-b matrix-vector multiplies. The KeyRAM IC comprises a multi-bit multi-bank IMC architecture with a digital co-processor. A sparsity-aware summation scheme is proposed to alleviate the challenge faced by IMCs when summing sparse activations. The digital co-processor employs diagonal major weight storage to compute without any stalls. This combination of the IMC and digital processors enables a balanced tradeoff between energy efficiency and high accuracy computation. The resultant KWS IC achieves SOTA decision latency of 39.9 s with a decision energy < 0.5 J /dec which translates to more than 24 savings in the energy-delay product (EDP) of decisions over existing KWS ICs.
KW - In-memory computing (IMC)
KW - keyword spotting (KWS)
KW - machine learning
KW - recurrent attention networks
UR - http://www.scopus.com/inward/record.url?scp=85112467479&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85112467479&partnerID=8YFLogxK
U2 - 10.1109/JSSC.2020.3029586
DO - 10.1109/JSSC.2020.3029586
M3 - Article
AN - SCOPUS:85112467479
SN - 0018-9200
VL - 56
SP - 2234
EP - 2244
JO - IEEE Journal of Solid-State Circuits
JF - IEEE Journal of Solid-State Circuits
IS - 7
M1 - 9239367
ER -