TY - GEN
T1 - Chameleon
T2 - 49th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2016
AU - Asghari-Moghaddam, Hadi
AU - Son, Young Hoon
AU - Ahn, Jung Ho
AU - Kim, Nam Sung
N1 - Funding Information:
This work was supported in part by generous grants from Samsung Semiconductor (2016-03835), NSF (CNS-1217102), DARPA (HR0011-12-2-0019), and MOTIE/KSRC (Future Semiconductor Device Technology Development Program-10044735).
Publisher Copyright:
© 2016 IEEE.
PY - 2016/12/14
Y1 - 2016/12/14
N2 - The performance of computer systems is often limited by the bandwidth of their memory channels, but further increasing the bandwidth is challenging under the stringent pin and power constraints of packages. To further increase performance under these constraints, various near-DRAM acceleration (NDA) architectures, which tightly integrate accelerators with DRAM devices using 3D/2.5D-stacking technology, have been proposed. However, they have not prevailed yet because they often rely on expensive HBM/HMC-like DRAM devices which also suffer from limited capacity, whereas the scalability of memory capacity is critical for some computing segments such as servers. In this paper, we first demonstrate that data buffers in a load-reduced DIMM (LRDIMM), which is originally developed to support large memory systems for servers, are supreme places to integrate near-DRAM accelerators. Second, we propose Chameleon, an NDA architecture that can be realized without relying on 3D/2.5D-stacking technology and seamlessly integrated with large memory systems for servers. Third, we explore three microarchitectures that abate constraints imposed by taking LRDIMM architecture for NDA. Our experiment demonstrates that a Chameleon-based system can offer 2.13 χ higher geo-mean performance while consuming 34% lower geo-mean data transfer energy than a system that integrates the same accelerator logic within the processor.
AB - The performance of computer systems is often limited by the bandwidth of their memory channels, but further increasing the bandwidth is challenging under the stringent pin and power constraints of packages. To further increase performance under these constraints, various near-DRAM acceleration (NDA) architectures, which tightly integrate accelerators with DRAM devices using 3D/2.5D-stacking technology, have been proposed. However, they have not prevailed yet because they often rely on expensive HBM/HMC-like DRAM devices which also suffer from limited capacity, whereas the scalability of memory capacity is critical for some computing segments such as servers. In this paper, we first demonstrate that data buffers in a load-reduced DIMM (LRDIMM), which is originally developed to support large memory systems for servers, are supreme places to integrate near-DRAM accelerators. Second, we propose Chameleon, an NDA architecture that can be realized without relying on 3D/2.5D-stacking technology and seamlessly integrated with large memory systems for servers. Third, we explore three microarchitectures that abate constraints imposed by taking LRDIMM architecture for NDA. Our experiment demonstrates that a Chameleon-based system can offer 2.13 χ higher geo-mean performance while consuming 34% lower geo-mean data transfer energy than a system that integrates the same accelerator logic within the processor.
UR - http://www.scopus.com/inward/record.url?scp=85006762808&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85006762808&partnerID=8YFLogxK
U2 - 10.1109/MICRO.2016.7783753
DO - 10.1109/MICRO.2016.7783753
M3 - Conference contribution
AN - SCOPUS:85006762808
T3 - Proceedings of the Annual International Symposium on Microarchitecture, MICRO
BT - MICRO 2016 - 49th Annual IEEE/ACM International Symposium on Microarchitecture
PB - IEEE Computer Society
Y2 - 15 October 2016 through 19 October 2016
ER -