Tightening mutual information-based bounds on generalization error

Yuheng Bu, Shaofeng Zou, Venugopal V. Veeravalli

Research output: Contribution to journalArticlepeer-review

Abstract

An information-theoretic upper bound on the generalization error of supervised learning algorithms is derived. The bound is constructed in terms of the mutual information between each individual training sample and the output of the learning algorithm. The bound is derived under more general conditions on the loss function than in existing studies; nevertheless, it provides a tighter characterization of the generalization error. Examples of learning algorithms are provided to demonstrate the tightness of the bound, and to show that it has a broad range of applicability. Application to noisy and iterative algorithms, e.g., stochastic gradient Langevin dynamics (SGLD), is also studied, where the constructed bound provides a tighter characterization of the generalization error than existing results. Finally, it is demonstrated that, unlike existing bounds, which are difficult to compute and evaluate empirically, the proposed bound can be estimated easily in practice.

Original languageEnglish (US)
Article number2991139
Pages (from-to)121-130
Number of pages10
JournalIEEE Journal on Selected Areas in Information Theory
Volume1
Issue number1
DOIs
StatePublished - May 2020

Keywords

  • Cumulant generating function
  • Generalization error
  • Information-theoretic bounds
  • Stochastic gradient Langevin dynamics

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Media Technology
  • Artificial Intelligence
  • Applied Mathematics

Fingerprint

Dive into the research topics of 'Tightening mutual information-based bounds on generalization error'. Together they form a unique fingerprint.

Cite this