Near-Memory Processing in Action: Accelerating Personalized Recommendation with AxDIMM

Liu Ke, Xuan Zhang, Jinin So, Jong Geon Lee, Shin Haeng Kang, Sukhan Lee, Songyi Han, Yeongon Cho, Jin Hyun Kim, Yongsuk Kwon, Kyungsoo Kim, Jin Jung, Ilkwon Yun, Sung Joo Park, Hyunsun Park, Joonho Song, Jeonghyeon Cho, Kyomin Sohn, Nam Sung Kim, Hsien Hsin S. Lee

Research output: Contribution to journalArticlepeer-review

Abstract

Near-memory processing (NMP) is a prospective paradigm enabling memory-centric computing. By moving the compute capability next to the main memory (DRAM modules), it can fundamentally address the CPU-memory bandwidth bottleneck and thus effectively improve the performance of memory-constrained workloads. Using the personalized recommendation system as a driving example, we developed a scalable, practical DIMM-based NMP solution tailor-designed for accelerating the inference serving. Our solution is demonstrated on a versatile FPGA-enabled NMP platform called AxDIMM that allows rapid prototyping and evaluation of NMP's performance potential on real hardware under a realistic system setting using industry-representative recommendation framework. We experimentally validated the performance of a two-ranked AxDIMM prototype, which achieves up to 1.89× speedup in latency and 31.6% memory energy saving for embedding operations. For end-to-end recommendation inference serving, AxDIMM improves the throughput up to 1.5× and latency-bounded throughput up to 1.77×, respectively.

Original languageEnglish (US)
Pages (from-to)116-127
Number of pages12
JournalIEEE Micro
Volume42
Issue number1
DOIs
StatePublished - 2022
Externally publishedYes

ASJC Scopus subject areas

  • Software
  • Hardware and Architecture
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Near-Memory Processing in Action: Accelerating Personalized Recommendation with AxDIMM'. Together they form a unique fingerprint.

Cite this