TY - JOUR
T1 - A Unified Framework for Information-Theoretic Generalization Bounds
AU - Chu, Yifeng
AU - Raginsky, Maxim
N1 - This work was supported by the Illinois Institute for Data Science and Dynamical Systems (iDS2), an NSF HDR TRIPODS institute, under award CCF-1934986. The authors would like to thank Matus Telgarsky for some valuable suggestions and the anonymous reviewers for pointing out some relevant work that was not cited in the original submission.
PY - 2023
Y1 - 2023
N2 - This paper presents a general methodology for deriving information-theoretic generalization bounds for learning algorithms. The main technical tool is a probabilistic decorrelation lemma based on a change of measure and a relaxation of Young's inequality in Lψp Orlicz spaces. Using the decorrelation lemma in combination with other techniques, such as symmetrization, couplings, and chaining in the space of probability measures, we obtain new upper bounds on the generalization error, both in expectation and in high probability, and recover as special cases many of the existing generalization bounds, including the ones based on mutual information, conditional mutual information, stochastic chaining, and PAC-Bayes inequalities. In addition, the Fernique-Talagrand upper bound on the expected supremum of a subgaussian process emerges as a special case.
AB - This paper presents a general methodology for deriving information-theoretic generalization bounds for learning algorithms. The main technical tool is a probabilistic decorrelation lemma based on a change of measure and a relaxation of Young's inequality in Lψp Orlicz spaces. Using the decorrelation lemma in combination with other techniques, such as symmetrization, couplings, and chaining in the space of probability measures, we obtain new upper bounds on the generalization error, both in expectation and in high probability, and recover as special cases many of the existing generalization bounds, including the ones based on mutual information, conditional mutual information, stochastic chaining, and PAC-Bayes inequalities. In addition, the Fernique-Talagrand upper bound on the expected supremum of a subgaussian process emerges as a special case.
UR - http://www.scopus.com/inward/record.url?scp=85185876432&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85185876432&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85185876432
SN - 1049-5258
VL - 36
JO - Advances in Neural Information Processing Systems
JF - Advances in Neural Information Processing Systems
T2 - 37th Conference on Neural Information Processing Systems, NeurIPS 2023
Y2 - 10 December 2023 through 16 December 2023
ER -