25.4 A 20nm 6GB Function-In-Memory DRAM, Based on HBM2 with a 1.2TFLOPS Programmable Computing Unit Using Bank-Level Parallelism, for Machine Learning Applications

Young Cheon Kwon, Suk Han Lee, Jaehoon Lee, Sang Hyuk Kwon, Je Min Ryu, Jong Pil Son, O. Seongil, Hak Soo Yu, Haesuk Lee, Soo Young Kim, Youngmin Cho, Jin Guk Kim, Jongyoon Choi, Hyun Sung Shin, Jin Kim, Beng Seng Phuah, Hyoung Min Kim, Myeong Jun Song, Ahn Choi, Daeho KimSoo Young Kim, Eun Bong Kim, David Wang, Shinhaeng Kang, Yuhwan Ro, Seungwoo Seo, Joon Ho Song, Jaeyoun Youn, Kyomin Sohn, Nam Sung Kim

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In recent years, artificial intelligence (AI) technology has proliferated rapidly and widely into application areas such as speech recognition, health care, and autonomous driving. To increase the capabilities of AI more powerful systems are needed to process a larger amount of data. This requirement has made domain-specific accelerators, such as GPUs and TPUs, popular; as they can provide orders of magnitude higher performance than state-of-the-art CPUs. However, these accelerators can only operate at their peak performance when they get the necessary data from memory as quickly as it is processed: requiring off-chip memory with a high bandwidth and a large capacity [1]. HBM has thus far met the bandwidth and capacity requirement [2] -[6], but recent AI technologies such as recurrent neural networks require an even higher bandwidth than HBM [7]-[8]. While a further increase in off-chip bandwidth can be accomplished by various techniques, it is often limited by power constraints at the chip or system level [9]. Hence, it is essential to decrease demand for off-chip bandwidth with unconventional architectures: such as processing-in-memory. In this paper, we present function-In-memory DRAM (FIMDRAM) that integrates a 16-wide single-instruction multiple-data engine within the memory banks and that exploits bank-level parallelism to provide 4 \times higher processing bandwidth than an off-chip memory solution. Second, we show techniques that do not require any modification to conventional memory controllers and their command protocols, which make FIMDRAM more practical for quick industry adoption. Finally, we conclude this paper with circuit- and system-level evaluations of our fabricated FIMDRAM.

Original languageEnglish (US)
Title of host publication2021 IEEE International Solid-State Circuits Conference, ISSCC 2021 - Digest of Technical Papers
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages350-352
Number of pages3
ISBN (Electronic)9781728195490
DOIs
StatePublished - Feb 13 2021
Externally publishedYes
Event2021 IEEE International Solid-State Circuits Conference, ISSCC 2021 - San Francisco, United States
Duration: Feb 13 2021Feb 22 2021

Publication series

NameDigest of Technical Papers - IEEE International Solid-State Circuits Conference
Volume64
ISSN (Print)0193-6530

Conference

Conference2021 IEEE International Solid-State Circuits Conference, ISSCC 2021
CountryUnited States
CitySan Francisco
Period2/13/212/22/21

ASJC Scopus subject areas

  • Electronic, Optical and Magnetic Materials
  • Electrical and Electronic Engineering

Fingerprint Dive into the research topics of '25.4 A 20nm 6GB Function-In-Memory DRAM, Based on HBM2 with a 1.2TFLOPS Programmable Computing Unit Using Bank-Level Parallelism, for Machine Learning Applications'. Together they form a unique fingerprint.

Cite this