Spectrally-normalized margin bounds for neural networks

Peter L. Bartlett, Dylan J. Foster, Matus Jan Telgarsky

Research output: Contribution to journalConference article

Abstract

This paper presents a margin-based multiclass generalization bound for neural networks that scales with their margin-normalized spectral complexity: their Lipschitz constant, meaning the product of the spectral norms of the weight matrices, times a certain correction factor. This bound is empirically investigated for a standard AlexNet network trained with SGD on the mnist and cif ar 10 datasets, with both original and random labels; the bound, the Lipschitz constants, and the excess risks are all in direct correlation, suggesting both that SGD selects predictors whose complexity scales with the difficulty of the learning task, and secondly that the presented bound is sensitive to this complexity.

Original languageEnglish (US)
Pages (from-to)6241-6250
Number of pages10
JournalAdvances in Neural Information Processing Systems
Volume2017-December
StatePublished - Jan 1 2017
Event31st Annual Conference on Neural Information Processing Systems, NIPS 2017 - Long Beach, United States
Duration: Dec 4 2017Dec 9 2017

Fingerprint

Labels
Neural networks

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing

Cite this

Spectrally-normalized margin bounds for neural networks. / Bartlett, Peter L.; Foster, Dylan J.; Telgarsky, Matus Jan.

In: Advances in Neural Information Processing Systems, Vol. 2017-December, 01.01.2017, p. 6241-6250.

Research output: Contribution to journalConference article

Bartlett, Peter L. ; Foster, Dylan J. ; Telgarsky, Matus Jan. / Spectrally-normalized margin bounds for neural networks. In: Advances in Neural Information Processing Systems. 2017 ; Vol. 2017-December. pp. 6241-6250.
@article{2ec314903eb14f36b2ddd4542ea164d0,
title = "Spectrally-normalized margin bounds for neural networks",
abstract = "This paper presents a margin-based multiclass generalization bound for neural networks that scales with their margin-normalized spectral complexity: their Lipschitz constant, meaning the product of the spectral norms of the weight matrices, times a certain correction factor. This bound is empirically investigated for a standard AlexNet network trained with SGD on the mnist and cif ar 10 datasets, with both original and random labels; the bound, the Lipschitz constants, and the excess risks are all in direct correlation, suggesting both that SGD selects predictors whose complexity scales with the difficulty of the learning task, and secondly that the presented bound is sensitive to this complexity.",
author = "Bartlett, {Peter L.} and Foster, {Dylan J.} and Telgarsky, {Matus Jan}",
year = "2017",
month = "1",
day = "1",
language = "English (US)",
volume = "2017-December",
pages = "6241--6250",
journal = "Advances in Neural Information Processing Systems",
issn = "1049-5258",

}

TY - JOUR

T1 - Spectrally-normalized margin bounds for neural networks

AU - Bartlett, Peter L.

AU - Foster, Dylan J.

AU - Telgarsky, Matus Jan

PY - 2017/1/1

Y1 - 2017/1/1

N2 - This paper presents a margin-based multiclass generalization bound for neural networks that scales with their margin-normalized spectral complexity: their Lipschitz constant, meaning the product of the spectral norms of the weight matrices, times a certain correction factor. This bound is empirically investigated for a standard AlexNet network trained with SGD on the mnist and cif ar 10 datasets, with both original and random labels; the bound, the Lipschitz constants, and the excess risks are all in direct correlation, suggesting both that SGD selects predictors whose complexity scales with the difficulty of the learning task, and secondly that the presented bound is sensitive to this complexity.

AB - This paper presents a margin-based multiclass generalization bound for neural networks that scales with their margin-normalized spectral complexity: their Lipschitz constant, meaning the product of the spectral norms of the weight matrices, times a certain correction factor. This bound is empirically investigated for a standard AlexNet network trained with SGD on the mnist and cif ar 10 datasets, with both original and random labels; the bound, the Lipschitz constants, and the excess risks are all in direct correlation, suggesting both that SGD selects predictors whose complexity scales with the difficulty of the learning task, and secondly that the presented bound is sensitive to this complexity.

UR - http://www.scopus.com/inward/record.url?scp=85046992478&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85046992478&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:85046992478

VL - 2017-December

SP - 6241

EP - 6250

JO - Advances in Neural Information Processing Systems

JF - Advances in Neural Information Processing Systems

SN - 1049-5258

ER -