Processing-in-memory (PIM) has been proposed to improve the performance of bandwidth-intensive workloads as well as save energy due to reduced compute-memory data movement. To realize PIM, programmable computing units were integrated with memory cores on an HBM2 device to enable parallel processing and minimize data movement. A graphics processing unit (GPU) system equipped with Samsung Aquabolt-XL HBM2-PIM devices improved microkernel general matrix-vector multiplication and speech recognition applications by 8.9× and 3.5×, respectively, and reduced energy consumption by over 60%. In a Xilinx AlveoU280 system, microkernel GEMV and ADD workload performances improved by 2.8×, and long short-term memory workload improved by 2.54×. Simulations show that a performance gain of over 2.3× may be attained in a system with LP5-PIM for certain transformer-based speech recognition with an energy reduction of 86%. In addition, AXDIMM, a DIMM-level PIM with acceleration buffers, exhibits an 80% performance improvement and a 42.6% energy savings over a regular RDIMM system.
ASJC Scopus subject areas
- Hardware and Architecture
- Electrical and Electronic Engineering