Abstract
Many statistical estimators for high-dimensional linear regression are $M$-estimators, formed through minimizing a data-dependent square loss function plus a regularizer. This work considers a new class of estimators implicitly defined through a discretized gradient dynamic system under overparameterization. We show that, under suitable restricted isometry conditions, overparameterization leads to implicit regularization: if we directly apply gradient descent to the residual sum of squares with sufficiently small initial values then, under some proper early stopping rule, the iterates converge to a nearly sparse rate-optimal solution that improves over explicitly regularized approaches. In particular, the resulting estimator does not suffer from extra bias due to explicit penalties, and can achieve the parametric root-n rate when the signal-to-noise ratio is sufficiently high. We also perform simulations to compare our methods with high-dimensional linear regression with explicit regularization. Our results illustrate the advantages of using implicit regularization via gradient descent after overparameterization in sparse vector estimation.
Original language | English (US) |
---|---|
Pages (from-to) | 1033-1046 |
Number of pages | 14 |
Journal | Biometrika |
Volume | 109 |
Issue number | 4 |
DOIs | |
State | Published - Dec 1 2022 |
Keywords
- Early stopping
- Gradient descent
- High-dimensional regression
- Implicit regularization
- Overparameterization
ASJC Scopus subject areas
- Statistics and Probability
- General Mathematics
- Agricultural and Biological Sciences (miscellaneous)
- General Agricultural and Biological Sciences
- Statistics, Probability and Uncertainty
- Applied Mathematics