Sparseness Analysis in the Pretraining of Deep Neural Networks

Jun Li, Tong Zhang, Wei Luo, Jian Yang, Xiao Tong Yuan, Jian Zhang

Research output: Contribution to journalArticlepeer-review

Abstract

A major progress in deep multilayer neural networks (DNNs) is the invention of various unsupervised pretraining methods to initialize network parameters which lead to good prediction accuracy. This paper presents the sparseness analysis on the hidden unit in the pretraining process. In particular, we use the L1-norm to measure sparseness and provide some sufficient conditions for that pretraining leads to sparseness with respect to the popular pretraining models - such as denoising autoencoders (DAEs) and restricted Boltzmann machines (RBMs). Our experimental results demonstrate that when the sufficient conditions are satisfied, the pretraining models lead to sparseness. Our experiments also reveal that when using the sigmoid activation functions, pretraining plays an important sparseness role in DNNs with sigmoid (Dsigm), and when using the rectifier linear unit (ReLU) activation functions, pretraining becomes less effective for DNNs with ReLU (Drelu). Luckily, Drelu can reach a higher recognition accuracy than DNNs with pretraining (DAEs and RBMs), as it can capture the main benefit (such as sparseness-encouraging) of pretraining in Dsigm. However, ReLU is not adapted to the different firing rates in biological neurons, because the firing rate actually changes along with the varying membrane resistances. To address this problem, we further propose a family of rectifier piecewise linear units (RePLUs) to fit the different firing rates. The experimental results show that the performance of RePLU is better than ReLU, and is comparable with those with some pretraining techniques, such as RBMs and DAEs.

Original languageEnglish (US)
Article number7445251
Pages (from-to)1425-1438
Number of pages14
JournalIEEE Transactions on Neural Networks and Learning Systems
Volume28
Issue number6
DOIs
StatePublished - Jun 2017
Externally publishedYes

Keywords

  • Activation function
  • deep neural networks
  • infomax principle
  • sparseness
  • unsupervised pretraining

ASJC Scopus subject areas

  • Software
  • Computer Science Applications
  • Computer Networks and Communications
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Sparseness Analysis in the Pretraining of Deep Neural Networks'. Together they form a unique fingerprint.

Cite this