Resource and data optimization for hardware implementation of deep neural networks targeting FPGA-based edge devices

Xinheng Liu, Dae Hee Kim, Deming Chen, Chang Wu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Recently, as machine learning algorithms have become more practical, there has been much effort to implement them on edge devices that can be used in our daily lives. However, unlike server-scale devices, edge devices are relatively small and thus have much more limited resources. Therefore, control of resource usage and hardware optimization play an important role when we implement machine learning algorithms on an edge device. In this paper, we target convolutional neural networks (CNN) and explore various optimization and design techniques to realize them on FPGA devices. The key idea explored in this paper is Backward Pipeline Scheduling together with Latency Balancing which optimize the pipeline between CNN layers in order to significantly reduce the overall latency for processing a single image. We also develop a batch processing design to improve the throughput of the FPGA solution. We have achieved latency of 175.7µs for classifying one image in the MNIST data set using LeNet and 653.4µs for classifying one image in Cifar-10 data set using CifarNet. Without retraining, we are still able to maintain high accuracy of 97.6% for MNIST data set and 83.6% for the Cifar-10 data set. Our achieved single-image latency is 5.2x faster for LeNet and 1.95x faster for CifarNet compared to the NVIDIA Jetson TX1 solution.

Original languageEnglish (US)
Title of host publicationProceedings of the 20th System Level Interconnect Prediction Workshop, SLIP 2018
PublisherAssociation for Computing Machinery
ISBN (Electronic)9781450359009
DOIs
StatePublished - Jun 23 2018
Event20th System Level Interconnect Prediction Workshop, SLIP 2018 - San Francisco, United States
Duration: Jun 23 2018 → …

Publication series

NameProceedings of the 20th System Level Interconnect Prediction Workshop, SLIP 2018

Other

Other20th System Level Interconnect Prediction Workshop, SLIP 2018
Country/TerritoryUnited States
CitySan Francisco
Period6/23/18 → …

Keywords

  • Acceleration
  • Convolutional Neural Network
  • FPGA
  • High-Level Synthesis
  • Optimization

ASJC Scopus subject areas

  • Applied Mathematics
  • Electrical and Electronic Engineering
  • Computer Science Applications
  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'Resource and data optimization for hardware implementation of deep neural networks targeting FPGA-based edge devices'. Together they form a unique fingerprint.

Cite this