TY - JOUR
T1 - Accelerating Neural Network Inference with Processing-in-DRAM
T2 - From the Edge to the Cloud
AU - Oliveira, Geraldo F.
AU - Gomez-Luna, Juan
AU - Ghose, Saugata
AU - Boroumand, Amirali
AU - Mutlu, Onur
N1 - Research of the third author supported in part by Basic Science Research Institute Program, Ministry of Education of Korea, BSRI-94-1414 and GARC-KOSEF at Seoul National University, Korea. 1991 Mathematics Subject Classification: 17B67, 17B65, 17B10.
PY - 2022/8/29
Y1 - 2022/8/29
N2 - Neural networks(NNs) are growing in importance and complexity. An NN's performance (and energy efficiency) can be bound either by computation or memory resources. The (PIM) paradigm, where computation is placed near or within memory arrays, is a viable solution to accelerate memory-bound NNs. However, PIM architectures vary in form, where different PIM approaches lead to different trade-offs. Our goal is to analyze, discuss, and contrast DRAM-based PIM architectures for NN performance and energy efficiency. To do so, we analyze three state-of-the-art PIM architectures: (1) UPMEM, which integrates processors and DRAM arrays into a single 2D chip, (2) Mensa, a 3D-stacking-based PIM architecture tailored for edge devices, and (3) SIMDRAM, which uses the analog principles of DRAM to execute bit-serial operations. Our analysis reveals that PIM greatly benefits memory-bound NNs: (i) UPMEM provides 23× the performance of a high-end GPU when the GPU requires memory oversubscription for a GEMV kernel, (ii) Mensa improves energy efficiency and throughput by 3.0× and 3.1× over the baseline Edge TPU for 24 Google edge NN models, and (iii) SIMDRAM outperforms a CPU/GPU by 16.7×/1.4× for three binary NNs. We conclude that due to their natural limitations, each PIM architecture better suits the execution of NN models with distinct attributes.
AB - Neural networks(NNs) are growing in importance and complexity. An NN's performance (and energy efficiency) can be bound either by computation or memory resources. The (PIM) paradigm, where computation is placed near or within memory arrays, is a viable solution to accelerate memory-bound NNs. However, PIM architectures vary in form, where different PIM approaches lead to different trade-offs. Our goal is to analyze, discuss, and contrast DRAM-based PIM architectures for NN performance and energy efficiency. To do so, we analyze three state-of-the-art PIM architectures: (1) UPMEM, which integrates processors and DRAM arrays into a single 2D chip, (2) Mensa, a 3D-stacking-based PIM architecture tailored for edge devices, and (3) SIMDRAM, which uses the analog principles of DRAM to execute bit-serial operations. Our analysis reveals that PIM greatly benefits memory-bound NNs: (i) UPMEM provides 23× the performance of a high-end GPU when the GPU requires memory oversubscription for a GEMV kernel, (ii) Mensa improves energy efficiency and throughput by 3.0× and 3.1× over the baseline Edge TPU for 24 Google edge NN models, and (iii) SIMDRAM outperforms a CPU/GPU by 16.7×/1.4× for three binary NNs. We conclude that due to their natural limitations, each PIM architecture better suits the execution of NN models with distinct attributes.
KW - Analytical models
KW - Artificial neural networks
KW - Computational modeling
KW - Computer architecture
KW - Energy efficiency
KW - Random access memory
KW - Throughput
UR - http://www.scopus.com/inward/record.url?scp=85137570613&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85137570613&partnerID=8YFLogxK
U2 - 10.1109/MM.2022.3202350
DO - 10.1109/MM.2022.3202350
M3 - Article
AN - SCOPUS:85137570613
SN - 0272-1732
VL - 42
SP - 1
EP - 14
JO - IEEE Micro
JF - IEEE Micro
IS - 6
ER -