High-performance video content recognition with long-term recurrent convolutional network for FPGA

Xiaofan Zhang, Xinheng Liu, Anand Ramachandran, Chuanhao Zhuge, Shibin Tang, Peng Ouyang, Zuofu Cheng, Kyle Rupnow, Deming Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

FPGA is a promising candidate for the acceleration of Deep Neural Networks (DNN) with improved latency and energy consumption compared to CPU and GPU-based implementations. DNNs use sequences of layers of regular computation that are well suited for HLS-based design for FPGA. However, optimizing large neural networks under resource constraints is still a key challenge. HLS must manage on-chip computation, buffering resources, and off-chip memory accesses to minimize the total latency. In this paper, we present a design framework for DNNs that uses highly configurable IPs for neural network layers together with a new design space exploration engine for Resource Allocation Management (REALM). We also carry out efficient memory subsystem design and fixed-point weight re-training to further improve our FPGA solution. We demonstrate our design framework on the Long-term Recurrent Convolution Network for video inputs. Our implementation on a Xilinx VC709 board achieves 3.1X speedup compared to an NVIDIA K80 and 4.75X speedup compared to an Intel Xeon with 17.5X lower energy per image.

Original languageEnglish (US)
Title of host publication2017 27th International Conference on Field Programmable Logic and Applications, FPL 2017
EditorsDiana Gohringer, Dirk Stroobandt, Nele Mentens, Marco Santambrogio, Jari Nurmi
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9789090304281
DOIs
StatePublished - Oct 2 2017
Event27th International Conference on Field Programmable Logic and Applications, FPL 2017 - Gent, Belgium
Duration: Sep 4 2017Sep 6 2017

Publication series

Name2017 27th International Conference on Field Programmable Logic and Applications, FPL 2017

Other

Other27th International Conference on Field Programmable Logic and Applications, FPL 2017
CountryBelgium
CityGent
Period9/4/179/6/17

Fingerprint

Field programmable gate arrays (FPGA)
Neural networks
Data storage equipment
Network layers
Convolution
Resource allocation
Program processors
Energy utilization
Engines

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Computer Science Applications
  • Hardware and Architecture
  • Software

Cite this

Zhang, X., Liu, X., Ramachandran, A., Zhuge, C., Tang, S., Ouyang, P., ... Chen, D. (2017). High-performance video content recognition with long-term recurrent convolutional network for FPGA. In D. Gohringer, D. Stroobandt, N. Mentens, M. Santambrogio, & J. Nurmi (Eds.), 2017 27th International Conference on Field Programmable Logic and Applications, FPL 2017 [8056833] (2017 27th International Conference on Field Programmable Logic and Applications, FPL 2017). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.23919/FPL.2017.8056833

High-performance video content recognition with long-term recurrent convolutional network for FPGA. / Zhang, Xiaofan; Liu, Xinheng; Ramachandran, Anand; Zhuge, Chuanhao; Tang, Shibin; Ouyang, Peng; Cheng, Zuofu; Rupnow, Kyle; Chen, Deming.

2017 27th International Conference on Field Programmable Logic and Applications, FPL 2017. ed. / Diana Gohringer; Dirk Stroobandt; Nele Mentens; Marco Santambrogio; Jari Nurmi. Institute of Electrical and Electronics Engineers Inc., 2017. 8056833 (2017 27th International Conference on Field Programmable Logic and Applications, FPL 2017).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Zhang, X, Liu, X, Ramachandran, A, Zhuge, C, Tang, S, Ouyang, P, Cheng, Z, Rupnow, K & Chen, D 2017, High-performance video content recognition with long-term recurrent convolutional network for FPGA. in D Gohringer, D Stroobandt, N Mentens, M Santambrogio & J Nurmi (eds), 2017 27th International Conference on Field Programmable Logic and Applications, FPL 2017., 8056833, 2017 27th International Conference on Field Programmable Logic and Applications, FPL 2017, Institute of Electrical and Electronics Engineers Inc., 27th International Conference on Field Programmable Logic and Applications, FPL 2017, Gent, Belgium, 9/4/17. https://doi.org/10.23919/FPL.2017.8056833
Zhang X, Liu X, Ramachandran A, Zhuge C, Tang S, Ouyang P et al. High-performance video content recognition with long-term recurrent convolutional network for FPGA. In Gohringer D, Stroobandt D, Mentens N, Santambrogio M, Nurmi J, editors, 2017 27th International Conference on Field Programmable Logic and Applications, FPL 2017. Institute of Electrical and Electronics Engineers Inc. 2017. 8056833. (2017 27th International Conference on Field Programmable Logic and Applications, FPL 2017). https://doi.org/10.23919/FPL.2017.8056833
Zhang, Xiaofan ; Liu, Xinheng ; Ramachandran, Anand ; Zhuge, Chuanhao ; Tang, Shibin ; Ouyang, Peng ; Cheng, Zuofu ; Rupnow, Kyle ; Chen, Deming. / High-performance video content recognition with long-term recurrent convolutional network for FPGA. 2017 27th International Conference on Field Programmable Logic and Applications, FPL 2017. editor / Diana Gohringer ; Dirk Stroobandt ; Nele Mentens ; Marco Santambrogio ; Jari Nurmi. Institute of Electrical and Electronics Engineers Inc., 2017. (2017 27th International Conference on Field Programmable Logic and Applications, FPL 2017).
@inproceedings{e10c451278914a5ca29b5eb523d3c422,
title = "High-performance video content recognition with long-term recurrent convolutional network for FPGA",
abstract = "FPGA is a promising candidate for the acceleration of Deep Neural Networks (DNN) with improved latency and energy consumption compared to CPU and GPU-based implementations. DNNs use sequences of layers of regular computation that are well suited for HLS-based design for FPGA. However, optimizing large neural networks under resource constraints is still a key challenge. HLS must manage on-chip computation, buffering resources, and off-chip memory accesses to minimize the total latency. In this paper, we present a design framework for DNNs that uses highly configurable IPs for neural network layers together with a new design space exploration engine for Resource Allocation Management (REALM). We also carry out efficient memory subsystem design and fixed-point weight re-training to further improve our FPGA solution. We demonstrate our design framework on the Long-term Recurrent Convolution Network for video inputs. Our implementation on a Xilinx VC709 board achieves 3.1X speedup compared to an NVIDIA K80 and 4.75X speedup compared to an Intel Xeon with 17.5X lower energy per image.",
author = "Xiaofan Zhang and Xinheng Liu and Anand Ramachandran and Chuanhao Zhuge and Shibin Tang and Peng Ouyang and Zuofu Cheng and Kyle Rupnow and Deming Chen",
year = "2017",
month = "10",
day = "2",
doi = "10.23919/FPL.2017.8056833",
language = "English (US)",
series = "2017 27th International Conference on Field Programmable Logic and Applications, FPL 2017",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
editor = "Diana Gohringer and Dirk Stroobandt and Nele Mentens and Marco Santambrogio and Jari Nurmi",
booktitle = "2017 27th International Conference on Field Programmable Logic and Applications, FPL 2017",
address = "United States",

}

TY - GEN

T1 - High-performance video content recognition with long-term recurrent convolutional network for FPGA

AU - Zhang, Xiaofan

AU - Liu, Xinheng

AU - Ramachandran, Anand

AU - Zhuge, Chuanhao

AU - Tang, Shibin

AU - Ouyang, Peng

AU - Cheng, Zuofu

AU - Rupnow, Kyle

AU - Chen, Deming

PY - 2017/10/2

Y1 - 2017/10/2

N2 - FPGA is a promising candidate for the acceleration of Deep Neural Networks (DNN) with improved latency and energy consumption compared to CPU and GPU-based implementations. DNNs use sequences of layers of regular computation that are well suited for HLS-based design for FPGA. However, optimizing large neural networks under resource constraints is still a key challenge. HLS must manage on-chip computation, buffering resources, and off-chip memory accesses to minimize the total latency. In this paper, we present a design framework for DNNs that uses highly configurable IPs for neural network layers together with a new design space exploration engine for Resource Allocation Management (REALM). We also carry out efficient memory subsystem design and fixed-point weight re-training to further improve our FPGA solution. We demonstrate our design framework on the Long-term Recurrent Convolution Network for video inputs. Our implementation on a Xilinx VC709 board achieves 3.1X speedup compared to an NVIDIA K80 and 4.75X speedup compared to an Intel Xeon with 17.5X lower energy per image.

AB - FPGA is a promising candidate for the acceleration of Deep Neural Networks (DNN) with improved latency and energy consumption compared to CPU and GPU-based implementations. DNNs use sequences of layers of regular computation that are well suited for HLS-based design for FPGA. However, optimizing large neural networks under resource constraints is still a key challenge. HLS must manage on-chip computation, buffering resources, and off-chip memory accesses to minimize the total latency. In this paper, we present a design framework for DNNs that uses highly configurable IPs for neural network layers together with a new design space exploration engine for Resource Allocation Management (REALM). We also carry out efficient memory subsystem design and fixed-point weight re-training to further improve our FPGA solution. We demonstrate our design framework on the Long-term Recurrent Convolution Network for video inputs. Our implementation on a Xilinx VC709 board achieves 3.1X speedup compared to an NVIDIA K80 and 4.75X speedup compared to an Intel Xeon with 17.5X lower energy per image.

UR - http://www.scopus.com/inward/record.url?scp=85034426984&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85034426984&partnerID=8YFLogxK

U2 - 10.23919/FPL.2017.8056833

DO - 10.23919/FPL.2017.8056833

M3 - Conference contribution

AN - SCOPUS:85034426984

T3 - 2017 27th International Conference on Field Programmable Logic and Applications, FPL 2017

BT - 2017 27th International Conference on Field Programmable Logic and Applications, FPL 2017

A2 - Gohringer, Diana

A2 - Stroobandt, Dirk

A2 - Mentens, Nele

A2 - Santambrogio, Marco

A2 - Nurmi, Jari

PB - Institute of Electrical and Electronics Engineers Inc.

ER -