TwinDNN: A tale of two deep neural networks

Hyunmin Jeong, Deming Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Compression technologies for deep neural networks (DNNs), such as weight quantization, have been widely investigated to reduce the model size so that they can be implemented on hardware with strict resource restrictions. However, one major downside of model compression is accuracy degradation. To deal with this problem effectively, we propose a new compressed network inference scheme with a high accuracy but slower DNN coupled with its highly compressed DNN version that typically delivers much faster inference speed but with a lower accuracy. During the inference, we determine the confidence of the prediction of the compressed DNN, and infer the original neural network for the inputs that are considered not confident by the compressed DNN. The proposed design uses a balanced number of resources available on the hardware and can deliver overall accuracy close to the high accuracy model, but with the inference speed closer to the compressed DNN. We demonstrate our design on two image classification tasks: CIFAR-10 and ImageNet. Our experiments show that our design can recover up to 94% of accuracy drop caused by extreme network compression, with more than 90% speedup compared to just using the original DNN. This is more than 17% extra accuracy recovery and 36% extra speedup compared to the previous work with a similar concept on VGG-16. This is the first work that considers using a highly compressed DNN along with the original DNN in parallel to achieve high accuracy and speed at the same time, while maintaining the resource balance by using two different main computation sources efficiently on an FPGA.

Original languageEnglish (US)
Title of host publicationProceedings - 32nd IEEE International Conference on Application-Specific Systems, Architectures and Processors, ASAP 2021
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages133-140
Number of pages8
ISBN (Electronic)9781665427012
DOIs
StatePublished - Jul 2021
Event32nd IEEE International Conference on Application-Specific Systems, Architectures and Processors, ASAP 2021 - Virtual, Online, United States
Duration: Jul 7 2021Jul 8 2021

Publication series

NameProceedings of the International Conference on Application-Specific Systems, Architectures and Processors
Volume2021-text
ISSN (Print)1063-6862

Conference

Conference32nd IEEE International Conference on Application-Specific Systems, Architectures and Processors, ASAP 2021
Country/TerritoryUnited States
CityVirtual, Online
Period7/7/217/8/21

Keywords

  • Hardware Accelerator
  • High-Level-Synthesis
  • Machine Learning
  • Neural Network Quantization

ASJC Scopus subject areas

  • Hardware and Architecture
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'TwinDNN: A tale of two deep neural networks'. Together they form a unique fingerprint.

Cite this