Accelerating Neural Network Inference with Processing-in-DRAM: From the Edge to the Cloud

Geraldo F. Oliveira, Juan Gomez-Luna, Saugata Ghose, Amirali Boroumand, Onur Mutlu

Research output: Contribution to journalArticlepeer-review

Abstract

Neural networks(NNs) are growing in importance and complexity. An NN's performance (and energy efficiency) can be bound either by computation or memory resources. The (PIM) paradigm, where computation is placed near or within memory arrays, is a viable solution to accelerate memory-bound NNs. However, PIM architectures vary in form, where different PIM approaches lead to different trade-offs. Our goal is to analyze, discuss, and contrast DRAM-based PIM architectures for NN performance and energy efficiency. To do so, we analyze three state-of-the-art PIM architectures: (1) UPMEM, which integrates processors and DRAM arrays into a single 2D chip, (2) Mensa, a 3D-stacking-based PIM architecture tailored for edge devices, and (3) SIMDRAM, which uses the analog principles of DRAM to execute bit-serial operations. Our analysis reveals that PIM greatly benefits memory-bound NNs: (i) UPMEM provides 23× the performance of a high-end GPU when the GPU requires memory oversubscription for a GEMV kernel, (ii) Mensa improves energy efficiency and throughput by 3.0× and 3.1× over the baseline Edge TPU for 24 Google edge NN models, and (iii) SIMDRAM outperforms a CPU/GPU by 16.7×/1.4× for three binary NNs. We conclude that due to their natural limitations, each PIM architecture better suits the execution of NN models with distinct attributes.

Original languageEnglish (US)
Pages (from-to)1-14
Number of pages14
JournalIEEE Micro
Volume42
Issue number6
DOIs
StateE-pub ahead of print - Aug 29 2022

Keywords

  • Analytical models
  • Artificial neural networks
  • Computational modeling
  • Computer architecture
  • Energy efficiency
  • Random access memory
  • Throughput

ASJC Scopus subject areas

  • Software
  • Hardware and Architecture
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Accelerating Neural Network Inference with Processing-in-DRAM: From the Edge to the Cloud'. Together they form a unique fingerprint.

Cite this