Breakdown point of model selection when the number of variables exceeds the number of observations

David Donoho, Victoria Stodden

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The classical multivariate linear regression problem assumes p variables X1, X2, . . ., Xp and a response vector y, each with n observations, and a linear relationship between the two: y = Xß + z, where z ∼ N(O,σ2). We point out that when p > n, there is a breakdown point for standard model selection schemes, such that model selection only works well below a certain critical complexity level depending on n/p. We apply this notion to some standard model selection algorithms (Forward Stepwise, LASSO, LARS) in the case where p ≫ n. We and that 1) the breakdown point is well-de ned for random X -models and low noise, 2) increasing noise shifts the breakdown point to lower levels of sparsity, and reduces the model recovery ability of the algorithm in a systematic way, and 3) below breakdown, the size of coefcient errors follows the theoretical error distribution for the classical linear model.

Original languageEnglish (US)
Title of host publicationInternational Joint Conference on Neural Networks 2006, IJCNN '06
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1916-1921
Number of pages6
ISBN (Print)0780394909, 9780780394902
DOIs
StatePublished - 2006
Externally publishedYes
EventInternational Joint Conference on Neural Networks 2006, IJCNN '06 - Vancouver, BC, Canada
Duration: Jul 16 2006Jul 21 2006

Publication series

NameIEEE International Conference on Neural Networks - Conference Proceedings
ISSN (Print)1098-7576

Other

OtherInternational Joint Conference on Neural Networks 2006, IJCNN '06
Country/TerritoryCanada
CityVancouver, BC
Period7/16/067/21/06

ASJC Scopus subject areas

  • Software

Fingerprint

Dive into the research topics of 'Breakdown point of model selection when the number of variables exceeds the number of observations'. Together they form a unique fingerprint.

Cite this