Abstract
Processing-in-memory (PIM) has been proposed to improve the performance of bandwidth-intensive workloads as well as save energy due to reduced compute-memory data movement. To realize PIM, programmable computing units were integrated with memory cores on an HBM2 device to enable parallel processing and minimize data movement. A graphics processing unit (GPU) system equipped with Samsung Aquabolt-XL HBM2-PIM devices improved microkernel general matrix-vector multiplication and speech recognition applications by 8.9× and 3.5×, respectively, and reduced energy consumption by over 60%. In a Xilinx AlveoU280 system, microkernel GEMV and ADD workload performances improved by 2.8×, and long short-term memory workload improved by 2.54×. Simulations show that a performance gain of over 2.3× may be attained in a system with LP5-PIM for certain transformer-based speech recognition with an energy reduction of 86%. In addition, AXDIMM, a DIMM-level PIM with acceleration buffers, exhibits an 80% performance improvement and a 42.6% energy savings over a regular RDIMM system.
Original language | English (US) |
---|---|
Pages (from-to) | 20-30 |
Number of pages | 11 |
Journal | IEEE Micro |
Volume | 42 |
Issue number | 3 |
DOIs | |
State | Published - 2022 |
Keywords
- HBM2
- LPDDR5
- PIM
- Processing-In-Memory
ASJC Scopus subject areas
- Software
- Hardware and Architecture
- Electrical and Electronic Engineering