Abstract
Machine-learning algorithms can be viewed as stochastic transformations that map training data to hypotheses. Following Bousquet and Elisseeff, we say such an algorithm is stable if its output does not depend too much on any individual training example. Since stability is closely connected to generalization capabilities of learning algorithms, it is of interest to obtain sharp quantitative estimates on the generalization bias of machine-learning algorithms in terms of their stability properties. We describe several information-theoretic measures of algorithmic stability and illustrate their use for upper-bounding the generalization bias of learning algorithms. Specifically, we relate the expected generalization error of a learning algorithm to several information-theoretic quantities that capture the statistical dependence between the training data and the hypothesis. These include mutual information and erasure mutual information, and their counterparts induced by the total variation distance. We illustrate the general theory through examples, including the Gibbs algorithm and differentially private algorithms, and discuss strategies for controlling the generalization error.
Original language | English (US) |
---|---|
Title of host publication | Information-Theoretic Methods in Data Science |
Publisher | Cambridge University Press |
Pages | 302-329 |
Number of pages | 28 |
ISBN (Electronic) | 9781108616799 |
ISBN (Print) | 9781108427135 |
DOIs | |
State | Published - Jan 1 2021 |
Keywords
- generalization
- learning
- sample complexity
- stability
- supervised learning
- unsupervised learning
- Vapnik–Chervonenkis theory
ASJC Scopus subject areas
- General Engineering
- General Computer Science
- General Social Sciences
- General Mathematics