Information-Theoretic Stability and Generalization

Maxim Raginsky, Alexander Rakhlin, Aolin Xu

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

Machine-learning algorithms can be viewed as stochastic transformations that map training data to hypotheses. Following Bousquet and Elisseeff, we say such an algorithm is stable if its output does not depend too much on any individual training example. Since stability is closely connected to generalization capabilities of learning algorithms, it is of interest to obtain sharp quantitative estimates on the generalization bias of machine-learning algorithms in terms of their stability properties. We describe several information-theoretic measures of algorithmic stability and illustrate their use for upper-bounding the generalization bias of learning algorithms. Specifically, we relate the expected generalization error of a learning algorithm to several information-theoretic quantities that capture the statistical dependence between the training data and the hypothesis. These include mutual information and erasure mutual information, and their counterparts induced by the total variation distance. We illustrate the general theory through examples, including the Gibbs algorithm and differentially private algorithms, and discuss strategies for controlling the generalization error.

Original languageEnglish (US)
Title of host publicationInformation-Theoretic Methods in Data Science
PublisherCambridge University Press
Pages302-329
Number of pages28
ISBN (Electronic)9781108616799
ISBN (Print)9781108427135
DOIs
StatePublished - Jan 1 2021
Externally publishedYes

Keywords

  • generalization
  • learning
  • sample complexity
  • stability
  • supervised learning
  • unsupervised learning
  • Vapnik–Chervonenkis theory

ASJC Scopus subject areas

  • General Engineering
  • General Computer Science
  • General Social Sciences
  • General Mathematics

Fingerprint

Dive into the research topics of 'Information-Theoretic Stability and Generalization'. Together they form a unique fingerprint.

Cite this