TY - GEN

T1 - Breakdown point of model selection when the number of variables exceeds the number of observations

AU - Donoho, David

AU - Stodden, Victoria

N1 - Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.

PY - 2006

Y1 - 2006

N2 - The classical multivariate linear regression problem assumes p variables X1, X2, . . ., Xp and a response vector y, each with n observations, and a linear relationship between the two: y = Xß + z, where z ∼ N(O,σ2). We point out that when p > n, there is a breakdown point for standard model selection schemes, such that model selection only works well below a certain critical complexity level depending on n/p. We apply this notion to some standard model selection algorithms (Forward Stepwise, LASSO, LARS) in the case where p ≫ n. We and that 1) the breakdown point is well-de ned for random X -models and low noise, 2) increasing noise shifts the breakdown point to lower levels of sparsity, and reduces the model recovery ability of the algorithm in a systematic way, and 3) below breakdown, the size of coefcient errors follows the theoretical error distribution for the classical linear model.

AB - The classical multivariate linear regression problem assumes p variables X1, X2, . . ., Xp and a response vector y, each with n observations, and a linear relationship between the two: y = Xß + z, where z ∼ N(O,σ2). We point out that when p > n, there is a breakdown point for standard model selection schemes, such that model selection only works well below a certain critical complexity level depending on n/p. We apply this notion to some standard model selection algorithms (Forward Stepwise, LASSO, LARS) in the case where p ≫ n. We and that 1) the breakdown point is well-de ned for random X -models and low noise, 2) increasing noise shifts the breakdown point to lower levels of sparsity, and reduces the model recovery ability of the algorithm in a systematic way, and 3) below breakdown, the size of coefcient errors follows the theoretical error distribution for the classical linear model.

UR - http://www.scopus.com/inward/record.url?scp=40649103930&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=40649103930&partnerID=8YFLogxK

U2 - 10.1109/ijcnn.2006.246934

DO - 10.1109/ijcnn.2006.246934

M3 - Conference contribution

AN - SCOPUS:40649103930

SN - 0780394909

SN - 9780780394902

T3 - IEEE International Conference on Neural Networks - Conference Proceedings

SP - 1916

EP - 1921

BT - International Joint Conference on Neural Networks 2006, IJCNN '06

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - International Joint Conference on Neural Networks 2006, IJCNN '06

Y2 - 16 July 2006 through 21 July 2006

ER -