Implementing neural machine translation with bi-directional GRU and attention mechanism on FPGAs using HLS

Qin Li, Xiaofan Zhang, Jin Jun Xiong, Wen Mei Hwu, Deming Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Neural machine translation (NMT) is a popular topic in Natural Language Processing which uses deep neural networks (DNNs) for translation from source to targeted languages. With the emerging technologies, such as bidirectional Gated Recurrent Units (GRU), attention mechanisms, and beam-search algorithms, NMT can deliver improved translation quality compared to the conventional statistics-based methods, especially for translating long sentences. However, higher translation quality means more complicated models, higher computation/memory demands, and longer translation time, which causes difficulties for practical use. In this paper, we propose a design methodology for implementing the inference of a real-life NMT (with the problem size = 172 GFLOP) on FPGA for improved run time latency and energy efficiency. We use High-Level Synthesis (HLS) to build high-performance parameterized IPs for handling the most basic operations (multiply-accumulations) and construct these IPs to accelerate the matrix-vector multiplication (MVM) kernels, which are frequently used in NMT. Also, we perform a design space exploration by considering both computation resources and memory access bandwidth when utilizing the hardware parallelism in the model and generate the best parameter configurations of the proposed IPs. Accordingly, we propose a novel hybrid parallel structure for accelerating the NMT with affordable resource overhead for the targeted FPGA. Our design is demonstrated on a Xilinx VCU118 with overall performance at 7.16 GFLOPS.

Original languageEnglish (US)
Title of host publicationASP-DAC 2019 - 24th Asia and South Pacific Design Automation Conference
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages693-698
Number of pages6
ISBN (Electronic)9781450360074
DOIs
StatePublished - Jan 21 2019
Event24th Asia and South Pacific Design Automation Conference, ASPDAC 2019 - Tokyo, Japan
Duration: Jan 21 2019Jan 24 2019

Publication series

NameProceedings of the Asia and South Pacific Design Automation Conference, ASP-DAC

Other

Other24th Asia and South Pacific Design Automation Conference, ASPDAC 2019
CountryJapan
CityTokyo
Period1/21/191/24/19

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Computer Science Applications
  • Computer Graphics and Computer-Aided Design

Fingerprint Dive into the research topics of 'Implementing neural machine translation with bi-directional GRU and attention mechanism on FPGAs using HLS'. Together they form a unique fingerprint.

  • Cite this

    Li, Q., Zhang, X., Xiong, J. J., Hwu, W. M., & Chen, D. (2019). Implementing neural machine translation with bi-directional GRU and attention mechanism on FPGAs using HLS. In ASP-DAC 2019 - 24th Asia and South Pacific Design Automation Conference (pp. 693-698). (Proceedings of the Asia and South Pacific Design Automation Conference, ASP-DAC). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1145/3287624.3287717