ON THE BENEFIT OF WIDTH FOR NEURAL NETWORKS: DISAPPEARANCE OF BASINS

Dawei Li, Tian Ding, Ruoyu Sun

Research output: Contribution to journalArticlepeer-review

Abstract

Wide networks are often believed to have a nice optimization landscape, but what rigorous results can we prove? To understand the benefit of width, it is important to identify the difference between wide and narrow networks. In this work, we prove that from narrow to wide networks, there is a phase transition from having suboptimal basins to no suboptimal basins. Specifically, we prove two results: on the positive side, for any continuous activation functions, the loss surface of a class of wide networks has no suboptimal basin, where “basin” is defined as the setwise strict local minimum; on the negative side, for a large class of networks with width below a threshold, we construct strict local minima that are not global. These two results together show the phase transition from narrow to wide networks.

Original languageEnglish (US)
Pages (from-to)1728-1758
Number of pages31
JournalSIAM Journal on Optimization
Volume32
Issue number3
DOIs
StatePublished - 2022

Keywords

  • deep learning
  • landscape
  • neural networks
  • overparameterization
  • phase transition

ASJC Scopus subject areas

  • Software
  • Theoretical Computer Science
  • Applied Mathematics

Fingerprint

Dive into the research topics of 'ON THE BENEFIT OF WIDTH FOR NEURAL NETWORKS: DISAPPEARANCE OF BASINS'. Together they form a unique fingerprint.

Cite this