TY - GEN
T1 - SIMDRAM
T2 - 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2021
AU - Hajinazar, Nastaran
AU - Oliveira, Geraldo F.
AU - Gregorio, Sven
AU - Ferreira, Joaõ Dinis
AU - Ghiasi, Nika Mansouri
AU - Patel, Minesh
AU - Alser, Mohammed
AU - Ghose, Saugata
AU - Gómez-Luna, Juan
AU - Mutlu, Onur
N1 - Publisher Copyright:
© 2021 ACM.
PY - 2021/4/19
Y1 - 2021/4/19
N2 - Processing-using-DRAM has been proposed for a limited set of basic operations (i.e., logic operations, addition). However, in order to enable full adoption of processing-using-DRAM, it is necessary to provide support for more complex operations. In this paper, we propose SIMDRAM, a flexible general-purpose processing-using-DRAM framework that (1) enables the efficient implementation of complex operations, and (2) provides a flexible mechanism tosupport the implementation of arbitrary user-defined operations. The SIMDRAM framework comprises three key steps. The first step builds an efficient MAJ/NOT representation of a given desired operation. The second step allocates DRAM rows that are reserved for computation to the operation's input and output operands, and generates the required sequence of DRAM commands to perform the MAJ/NOT implementation of the desired operation in DRAM. The third step uses the SIMDRAM control unit located inside the memory controller to manage the computation of the operation from start to end, by executing the DRAM commands generated in the second step of the framework. We design the hardware and ISA support for SIMDRAM framework to (1) address key system integration challenges, and (2) allow programmers to employ new SIMDRAM operations without hardware changes. We evaluate SIMDRAM for reliability, area overhead, throughput, and energy efficiency using a wide range of operations and seven real-world applications to demonstrate SIMDRAM's generality. Our evaluations using a single DRAM bank show that (1) over 16 operations, SIMDRAM provides 2.0X the throughput and 2.6X the energy efficiency of Ambit, a state-of-the-art processing-using-DRAM mechanism; (2) over seven real-world applications, SIMDRAM provides 2.5X the performance of Ambit. Using 16 DRAM banks, SIMDRAM provides (1) 88X and 5.8X the throughput, and 257X and 31X the energy efficiency, of a CPU and a high-end GPU, respectively, over 16 operations; (2) 21X and 2.1X the performance of the CPU and GPU, over seven real-world applications. SIMDRAM incurs an area overhead of only 0.2% in a high-end CPU.
AB - Processing-using-DRAM has been proposed for a limited set of basic operations (i.e., logic operations, addition). However, in order to enable full adoption of processing-using-DRAM, it is necessary to provide support for more complex operations. In this paper, we propose SIMDRAM, a flexible general-purpose processing-using-DRAM framework that (1) enables the efficient implementation of complex operations, and (2) provides a flexible mechanism tosupport the implementation of arbitrary user-defined operations. The SIMDRAM framework comprises three key steps. The first step builds an efficient MAJ/NOT representation of a given desired operation. The second step allocates DRAM rows that are reserved for computation to the operation's input and output operands, and generates the required sequence of DRAM commands to perform the MAJ/NOT implementation of the desired operation in DRAM. The third step uses the SIMDRAM control unit located inside the memory controller to manage the computation of the operation from start to end, by executing the DRAM commands generated in the second step of the framework. We design the hardware and ISA support for SIMDRAM framework to (1) address key system integration challenges, and (2) allow programmers to employ new SIMDRAM operations without hardware changes. We evaluate SIMDRAM for reliability, area overhead, throughput, and energy efficiency using a wide range of operations and seven real-world applications to demonstrate SIMDRAM's generality. Our evaluations using a single DRAM bank show that (1) over 16 operations, SIMDRAM provides 2.0X the throughput and 2.6X the energy efficiency of Ambit, a state-of-the-art processing-using-DRAM mechanism; (2) over seven real-world applications, SIMDRAM provides 2.5X the performance of Ambit. Using 16 DRAM banks, SIMDRAM provides (1) 88X and 5.8X the throughput, and 257X and 31X the energy efficiency, of a CPU and a high-end GPU, respectively, over 16 operations; (2) 21X and 2.1X the performance of the CPU and GPU, over seven real-world applications. SIMDRAM incurs an area overhead of only 0.2% in a high-end CPU.
KW - Bulk Bitwise Operations
KW - DRAM
KW - Energy
KW - Performance
KW - Processing-Using-Memory
KW - Processing-in-Memory
UR - http://www.scopus.com/inward/record.url?scp=85104698559&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85104698559&partnerID=8YFLogxK
U2 - 10.1145/3445814.3446749
DO - 10.1145/3445814.3446749
M3 - Conference contribution
AN - SCOPUS:85104698559
T3 - International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS
SP - 329
EP - 345
BT - Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2021
PB - Association for Computing Machinery
Y2 - 19 April 2021 through 23 April 2021
ER -