TY - GEN
T1 - In-DRAM near-data approximate acceleration for GPUs
AU - Yazdanbakhsh, Amir
AU - Song, Choungki
AU - Sacks, Jacob
AU - Lotfi-Kamran, Pejman
AU - Esmaeilzadeh, Hadi
AU - Sung Kim, Nam
N1 - Publisher Copyright:
© 2018 Association for Computing Machinery.
PY - 2018/11/1
Y1 - 2018/11/1
N2 - GPUs are bottlenecked by the off-chip communication bandwidth and its energy cost; hence near-data acceleration is particularly attractive for GPUs. Integrating the accelerators withinDRAMcan mitigate these bottlenecks and additionally expose them to the higher internal bandwidth of DRAM. However, such an integration is challenging, as it requires low-overhead accelerators while supporting a diverse set of applications. To enable the integration, this work leverages the approximability of GPU applications and utilizes the neural transformation, which converts diverse regions of code mainly to Multiply-Accumulate (MAC). Furthermore, to preserve the SIMT execution model of GPUs, we also propose a novel approximateMAC unit with a significantly smaller area overhead. As such, this work introduces AXRAM-a novel DRAM architecture-that integrates several approximate MAC units. AXRAM offers this integration without increasing the memory column pitch or modifying the internal architecture of the DRAM banks. Our results with 10 GPGPU benchmarks show that, on average, AXRAM provides 2.6× speedup and 13.3× energy reduction over a baseline GPU with no acceleration. These benefits are achieved while reducing the overall DRAM system power by 26% with an area cost of merely 2.1%.
AB - GPUs are bottlenecked by the off-chip communication bandwidth and its energy cost; hence near-data acceleration is particularly attractive for GPUs. Integrating the accelerators withinDRAMcan mitigate these bottlenecks and additionally expose them to the higher internal bandwidth of DRAM. However, such an integration is challenging, as it requires low-overhead accelerators while supporting a diverse set of applications. To enable the integration, this work leverages the approximability of GPU applications and utilizes the neural transformation, which converts diverse regions of code mainly to Multiply-Accumulate (MAC). Furthermore, to preserve the SIMT execution model of GPUs, we also propose a novel approximateMAC unit with a significantly smaller area overhead. As such, this work introduces AXRAM-a novel DRAM architecture-that integrates several approximate MAC units. AXRAM offers this integration without increasing the memory column pitch or modifying the internal architecture of the DRAM banks. Our results with 10 GPGPU benchmarks show that, on average, AXRAM provides 2.6× speedup and 13.3× energy reduction over a baseline GPU with no acceleration. These benefits are achieved while reducing the overall DRAM system power by 26% with an area cost of merely 2.1%.
UR - http://www.scopus.com/inward/record.url?scp=85061555376&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85061555376&partnerID=8YFLogxK
U2 - 10.1145/3243176.3243188
DO - 10.1145/3243176.3243188
M3 - Conference contribution
AN - SCOPUS:85061555376
T3 - Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT
BT - Proceedings - 27th International Conference on Parallel Architectures and Compilation Techniques, PACT 2018
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 27th International Conference on Parallel Architectures and Compilation Techniques, PACT 2018
Y2 - 1 November 2018 through 4 November 2018
ER -