TY - JOUR
T1 - Learning neural networks with adaptive regularization
AU - Zhao, Han
AU - Tsai, Yao Hung Hubert
AU - Salakhutdinov, Ruslan
AU - Gordon, Geoffrey J.
N1 - HZ and GG would like to acknowledge support from the DARPA XAI project, contract #FA87501720152 and an Nvidia GPU grant. YT and RS were supported in part by DARPA grant FA875018C0150, DARPA SAGAMORE HR00111990016, Office of Naval Research grant N000141812861, AFRL CogDeCON, and Apple. YT and RS would also like to acknowledge NVIDIA’s GPU support. Last, we thank Denny Wu for suggestions on exploring and analyzing our algorithm in terms of stable rank.
PY - 2019
Y1 - 2019
N2 - Feed-forward neural networks can be understood as a combination of an intermediate representation and a linear hypothesis. While most previous works aim to diversify the representations, we explore the complementary direction by performing an adaptive and data-dependent regularization motivated by the empirical Bayes method. Specifically, we propose to construct a matrix-variate normal prior (on weights) whose covariance matrix has a Kronecker product structure. This structure is designed to capture the correlations in neurons through backpropagation. Under the assumption of this Kronecker factorization, the prior encourages neurons to borrow statistical strength from one another. Hence, it leads to an adaptive and data-dependent regularization when training networks on small datasets. To optimize the model, we present an efficient block coordinate descent algorithm with analytical solutions. Empirically, we demonstrate that the proposed method helps networks converge to local optima with smaller stable ranks and spectral norms. These properties suggest better generalizations and we present empirical results to support this expectation. We also verify the effectiveness of the approach on multiclass classification and multitask regression problems with various network structures. Our code is publicly available at: https://github.com/yaohungt/Adaptive-Regularization-Neural-Network.
AB - Feed-forward neural networks can be understood as a combination of an intermediate representation and a linear hypothesis. While most previous works aim to diversify the representations, we explore the complementary direction by performing an adaptive and data-dependent regularization motivated by the empirical Bayes method. Specifically, we propose to construct a matrix-variate normal prior (on weights) whose covariance matrix has a Kronecker product structure. This structure is designed to capture the correlations in neurons through backpropagation. Under the assumption of this Kronecker factorization, the prior encourages neurons to borrow statistical strength from one another. Hence, it leads to an adaptive and data-dependent regularization when training networks on small datasets. To optimize the model, we present an efficient block coordinate descent algorithm with analytical solutions. Empirically, we demonstrate that the proposed method helps networks converge to local optima with smaller stable ranks and spectral norms. These properties suggest better generalizations and we present empirical results to support this expectation. We also verify the effectiveness of the approach on multiclass classification and multitask regression problems with various network structures. Our code is publicly available at: https://github.com/yaohungt/Adaptive-Regularization-Neural-Network.
UR - http://www.scopus.com/inward/record.url?scp=85090171378&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85090171378&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85090171378
SN - 1049-5258
VL - 32
JO - Advances in Neural Information Processing Systems
JF - Advances in Neural Information Processing Systems
T2 - 33rd Annual Conference on Neural Information Processing Systems, NeurIPS 2019
Y2 - 8 December 2019 through 14 December 2019
ER -