TY - GEN
T1 - MIMDRAM
T2 - 30th IEEE International Symposium on High-Performance Computer Architecture, HPCA 2024
AU - Oliveira, Geraldo F.
AU - Olgun, Ataberk
AU - Yaglikci, Abdullah Giray
AU - Bostanci, F. Nisa
AU - Gomez-Luna, Juan
AU - Ghose, Saugata
AU - Mutlu, Onur
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Processing-using-DRAM (PUD) is a processing-in-memory (PIM) approach that uses a DRAM array's massive internal parallelism to execute very-wide (e.g., 16,384-262,144-bit-wide) data-parallel operations, in a single-instruction multiple-data (SIMD) fashion. However, DRAM rows' large and rigid granularity limit the effectiveness and applicability of PUD in three ways. First, since applications have varying degrees of SIMD parallelism (which is often smaller than the DRAM row granularity), PUD execution often leads to underutilization, through-put loss, and energy waste. Second, due to the high area cost of implementing interconnects that connect columns in a wide DRAM row, most PUD architectures are limited to the execution of parallel map operations, where a single operation is performed over equally-sized input and output arrays. Third, the need to feed the wide DRAM row with tens of thousands of data elements combined with the lack of adequate compiler support for PUD systems create a programmability barrier, since programmers need to manually extract SIMD parallelism from an application and map computation to the PUD hardware. Our goal is to design a flexible PUD system that overcomes the limitations caused by the large and rigid granularity of PUD. To this end, we propose MIMDRAM, a hardware/software co-designed PUD system that introduces new mechanisms to allocate and control only the necessary resources for a given PUD operation. The key idea of MIMDRAM is to leverage fine-grained DRAM (i.e., the ability to independently access smaller segments of a large DRAM row) for PUD computation. MIMDRAM exploits this key idea to enable a multiple-instruction multiple-data (MIMD) execution model in each DRAM subarray (and SIMD execution within each DRAM row segment). We evaluate MIMDRAM using twelve real-world applications and 495 multi-programmed application mixes. Our evaluation shows that MIMDRAM provides 34 × the performance, 14.3 × the energy efficiency, 1.7 × the throughput, and 1.3 × the fairness of a state-of-The-Art PUD framework, along with 30.6 × and 6.8 × the energy efficiency of a high-end CPU and GPU, respectively. MIMDRAM adds small area cost to a DRAM chip (1.11%) and CPU die (0.6%). We hope and believe that MIMDRAM's ideas and results will help to enable more efficient and easy-To-program PUD systems. To this end, we open source MIMDRAM at https://glthub.com/CMU-SAFARI/MIMDRAM.
AB - Processing-using-DRAM (PUD) is a processing-in-memory (PIM) approach that uses a DRAM array's massive internal parallelism to execute very-wide (e.g., 16,384-262,144-bit-wide) data-parallel operations, in a single-instruction multiple-data (SIMD) fashion. However, DRAM rows' large and rigid granularity limit the effectiveness and applicability of PUD in three ways. First, since applications have varying degrees of SIMD parallelism (which is often smaller than the DRAM row granularity), PUD execution often leads to underutilization, through-put loss, and energy waste. Second, due to the high area cost of implementing interconnects that connect columns in a wide DRAM row, most PUD architectures are limited to the execution of parallel map operations, where a single operation is performed over equally-sized input and output arrays. Third, the need to feed the wide DRAM row with tens of thousands of data elements combined with the lack of adequate compiler support for PUD systems create a programmability barrier, since programmers need to manually extract SIMD parallelism from an application and map computation to the PUD hardware. Our goal is to design a flexible PUD system that overcomes the limitations caused by the large and rigid granularity of PUD. To this end, we propose MIMDRAM, a hardware/software co-designed PUD system that introduces new mechanisms to allocate and control only the necessary resources for a given PUD operation. The key idea of MIMDRAM is to leverage fine-grained DRAM (i.e., the ability to independently access smaller segments of a large DRAM row) for PUD computation. MIMDRAM exploits this key idea to enable a multiple-instruction multiple-data (MIMD) execution model in each DRAM subarray (and SIMD execution within each DRAM row segment). We evaluate MIMDRAM using twelve real-world applications and 495 multi-programmed application mixes. Our evaluation shows that MIMDRAM provides 34 × the performance, 14.3 × the energy efficiency, 1.7 × the throughput, and 1.3 × the fairness of a state-of-The-Art PUD framework, along with 30.6 × and 6.8 × the energy efficiency of a high-end CPU and GPU, respectively. MIMDRAM adds small area cost to a DRAM chip (1.11%) and CPU die (0.6%). We hope and believe that MIMDRAM's ideas and results will help to enable more efficient and easy-To-program PUD systems. To this end, we open source MIMDRAM at https://glthub.com/CMU-SAFARI/MIMDRAM.
KW - DRAM
KW - energy-efficiency
KW - hardware/software co-design
KW - memory-centric computing
KW - processing-in-memory
UR - http://www.scopus.com/inward/record.url?scp=85185374754&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85185374754&partnerID=8YFLogxK
U2 - 10.1109/HPCA57654.2024.00024
DO - 10.1109/HPCA57654.2024.00024
M3 - Conference contribution
AN - SCOPUS:85185374754
T3 - Proceedings - International Symposium on High-Performance Computer Architecture
SP - 186
EP - 203
BT - Proceedings - 2024 IEEE International Symposium on High-Performance Computer Architecture, HPCA 2024
PB - IEEE Computer Society
Y2 - 2 March 2024 through 6 March 2024
ER -