Efficient Methods for Mapping Neural Machine Translator on FPGAs

Qin Li, Xiaofan Zhang, Jinjun Xiong, Wen Mei Hwu, Deming Chen

Research output: Contribution to journalArticlepeer-review

Abstract

Neural machine translation (NMT) is one of the most critical applications in natural language processing (NLP) with the main idea of converting text in one language to another using deep neural networks. In recent year, we have seen continuous development of NMT by integrating more emerging technologies, such as bidirectional gated recurrent units (GRU), attention mechanisms, and beam-search algorithms, for improved translation quality. However, with the increasing problem size, the real-life NMT models have become much more complicated and difficult to implement on hardware for acceleration opportunities. In this article, we aim to exploit the capability of FPGAs to deliver highly efficient implementations for real-life NMT applications. We map the inference of a large-scale NMT model with total computation of 172 GFLOP to a highly optimized high-level synthesis (HLS) IP and integrate the IP into Xilinx VCU118 FPGA platform. The model has widely used key features for NMTs, including the bidirectional GRU layer, attention mechanism, and beam search. We quantize the model to mixed-precision representation in which parameters and portions of calculations are in 16-bit half precision, and others remain as 32-bit floating-point. Compared to the float NMT implementation on FPGA, we achieve 13.1× speedup with an end-to-end performance of 22.0 GFLOPS without any accuracy degradation. Based on our knowledge, this is the first work that successfully implements a real-life end-to-end NMT model to an FPGA on board.

Original languageEnglish (US)
Article number9309170
Pages (from-to)1866-1877
Number of pages12
JournalIEEE Transactions on Parallel and Distributed Systems
Volume32
Issue number7
DOIs
StateAccepted/In press - 2020

Keywords

  • FPGA
  • Hardware-efficient inference
  • high level synthesis
  • neural machine translation

ASJC Scopus subject areas

  • Signal Processing
  • Hardware and Architecture
  • Computational Theory and Mathematics

Fingerprint Dive into the research topics of 'Efficient Methods for Mapping Neural Machine Translator on FPGAs'. Together they form a unique fingerprint.

Cite this