TY - JOUR
T1 - Generalization Bounds
T2 - Perspectives from Information Theory and PAC-Bayes
AU - Hellström, Fredrik
AU - Durisi, Giuseppe
AU - Guedj, Benjamin
AU - Raginsky, Maxim
N1 - The authors thank Yunwen Lei, Alex Olshevsky, Olivier Wintenberger, the anonymous referees, and the associate editor for providing valuable comments on an earlier version of this work. The authors are grateful to the team at Foundations and Trends\u00AE in Machine Learning, in particular Mike Casey and Lucy Wiseman, for their diligent handling of this manuscript. F.H. and G.D. acknowledge support by the Wallenberg AI, Autonomous Systems and Software Program (WASP) funded by the Knut and Alice Wallenberg Foundation. G.D. also acknowledges support by the Swedish Foundation for Strategic Research under grant number FUS21-0004. B.G. acknowledges partial support by the U.S. Army Research Laboratory and the U.S. Army Research Office, and by the U.K. Ministry of Defence and the U.K. Engineering and Physical Sciences Research Council (EPSRC) under grant number EP/R013616/1. B.G. acknowledges partial support from the French National Agency for Research, through grants ANR-18-CE40-0016-01 and ANR-18-CE23-0015-02, and through the programme \u201CFrance 2030\u201D and PEPR IA on grant SHARP ANR-23-PEIA-0008. M.R. acknowledges partial support by the U.S. National Science Foundation (NSF) through the Illinois Institute for Data Science and Dynamical Systems (iDS2), an NSF HDR TRIPODS Institute, under Award CCF-193498.
PY - 2025/1/23
Y1 - 2025/1/23
N2 - A fundamental question in theoretical machine learning is generalization. Over the past decades, the PAC-Bayesian approach has been established as a flexible framework to address the generalization capabilities of machine learning algorithms and design new ones. Recently, it has garnered increased interest due to its potential applicability for a variety of learning algorithms, including deep neural networks. In parallel, an information-theoretic view of generalization has developed, wherein the relation between generalization and various information measures has been established. This framework is intimately connected to the PAC-Bayesian approach, and a number of results have been independently discovered in both strands. In this monograph, we highlight this strong connection and present a unified treatment of PAC-Bayesian and information-theoretic generalization bounds. We present techniques and results that the two perspectives have in common, and discuss the approaches and interpretations that differ. In particular, we demonstrate how many proofs in the area share a modular structure, through which the underlying ideas can be intuited. We pay special attention to the conditional mutual information (CMI) framework, analytical studies of the information complexity of learning algorithms, and the application of the proposed methods to deep learning. This monograph is intended to provide a comprehensive introduction to information-theoretic generalization bounds and their connection to PAC-Bayes, serving as a foundation from which the most recent developments are accessible. It is aimed broadly towards researchers with an interest in generalization and theoretical machine learning.
AB - A fundamental question in theoretical machine learning is generalization. Over the past decades, the PAC-Bayesian approach has been established as a flexible framework to address the generalization capabilities of machine learning algorithms and design new ones. Recently, it has garnered increased interest due to its potential applicability for a variety of learning algorithms, including deep neural networks. In parallel, an information-theoretic view of generalization has developed, wherein the relation between generalization and various information measures has been established. This framework is intimately connected to the PAC-Bayesian approach, and a number of results have been independently discovered in both strands. In this monograph, we highlight this strong connection and present a unified treatment of PAC-Bayesian and information-theoretic generalization bounds. We present techniques and results that the two perspectives have in common, and discuss the approaches and interpretations that differ. In particular, we demonstrate how many proofs in the area share a modular structure, through which the underlying ideas can be intuited. We pay special attention to the conditional mutual information (CMI) framework, analytical studies of the information complexity of learning algorithms, and the application of the proposed methods to deep learning. This monograph is intended to provide a comprehensive introduction to information-theoretic generalization bounds and their connection to PAC-Bayes, serving as a foundation from which the most recent developments are accessible. It is aimed broadly towards researchers with an interest in generalization and theoretical machine learning.
UR - http://www.scopus.com/inward/record.url?scp=105000122478&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=105000122478&partnerID=8YFLogxK
U2 - 10.1561/2200000112
DO - 10.1561/2200000112
M3 - Article
AN - SCOPUS:105000122478
SN - 1935-8237
VL - 18
SP - 1
EP - 223
JO - Foundations and Trends in Machine Learning
JF - Foundations and Trends in Machine Learning
IS - 1
ER -