An FPGA-based RNN-T inference accelerator with PIM-HBM

Shin Haeng Kang, Sukhan Lee, Byeongho Kim, Hweesoo Kim, Kyomin Sohn, Nam Sung Kim, Eojin Lee

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper, we implemented a world-first RNN-T inference accelerator using FPGA with PIM-HBM that can multiply the internal bandwidth of the memory. The accelerator offloads matrix-vector multiplication (GEMV) operations of LSTM layers in RNN-T into PIM-HBM, and PIM-HBM reduces the execution time of GEMV significantly by exploiting HBM internal bandwidth. To ensure that the memory commands are issued in a pre-defined order, which is one of the most important constraints in exploiting PIM-HBM, we implement a direct memory access (DMA) module and change configuration of the on-chip memory controller by utilizing the flexibility and reconfigurability of the FPGA. In addition, we design the other hardware modules for acceleration such as non-linear functions (i.e., sigmoid and hyperbolic tangent), element-wise operation, and ReLU module, to operate these compute-bound RNN-T operations on FPGA. For this, we prepare FP16 quantized weight and MLPerf input datasets, and modify the PCIe device driver and C++ based control codes. On our evaluation, our accelerator with PIM-HBM reduces the execution time of RNN-T by 2.5 × on average with 11.09% reduced LUT size and improves energy efficiency up to 2.6 × compared to the baseline.

Original languageEnglish (US)
Title of host publicationFPGA 2022 - Proceedings of the 2022 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
PublisherAssociation for Computing Machinery, Inc
Pages146-152
Number of pages7
ISBN (Electronic)9781450391498
DOIs
StatePublished - Feb 13 2022
Externally publishedYes
Event2022 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, FPGA 2022 - Virtual, Online, United States
Duration: Feb 27 2022Mar 1 2022

Publication series

NameFPGA 2022 - Proceedings of the 2022 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Conference

Conference2022 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, FPGA 2022
Country/TerritoryUnited States
CityVirtual, Online
Period2/27/223/1/22

Keywords

  • accelerating vector-matrix multiplication
  • processing-in-memory
  • speech recognition

ASJC Scopus subject areas

  • Hardware and Architecture
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'An FPGA-based RNN-T inference accelerator with PIM-HBM'. Together they form a unique fingerprint.

Cite this