TY - JOUR
T1 - Precision Lasso
T2 - Accounting for correlations and linear dependencies in high-dimensional genomic data
AU - Wang, Haohan
AU - Lengerich, Benjamin J.
AU - Aragam, Bryon
AU - Xing, Eric P.
N1 - Funding Information:
This material is based upon work funded and supported by the Department of Defense under Contract No. FA8721-05-C-0003 with Carnegie Mellon University for the operation of the Software Engineering Institute, a federally funded research and development centre. This work is also supported by the National Institutes of Health grants R01-GM093156 and P30-DA035778.
Publisher Copyright:
© 2018 The Author(s). Published by Oxford University Press.
PY - 2019/4/1
Y1 - 2019/4/1
N2 - Motivation Association studies to discover links between genetic markers and phenotypes are central to bioinformatics. Methods of regularized regression, such as variants of the Lasso, are popular for this task. Despite the good predictive performance of these methods in the average case, they suffer from unstable selections of correlated variables and inconsistent selections of linearly dependent variables. Unfortunately, as we demonstrate empirically, such problematic situations of correlated and linearly dependent variables often exist in genomic datasets and lead to under-performance of classical methods of variable selection. Results To address these challenges, we propose the Precision Lasso. Precision Lasso is a Lasso variant that promotes sparse variable selection by regularization governed by the covariance and inverse covariance matrices of explanatory variables. We illustrate its capacity for stable and consistent variable selection in simulated data with highly correlated and linearly dependent variables. We then demonstrate the effectiveness of the Precision Lasso to select meaningful variables from transcriptomic profiles of breast cancer patients. Our results indicate that in settings with correlated and linearly dependent variables, the Precision Lasso outperforms popular methods of variable selection such as the Lasso, the Elastic Net and Minimax Concave Penalty (MCP) regression.
AB - Motivation Association studies to discover links between genetic markers and phenotypes are central to bioinformatics. Methods of regularized regression, such as variants of the Lasso, are popular for this task. Despite the good predictive performance of these methods in the average case, they suffer from unstable selections of correlated variables and inconsistent selections of linearly dependent variables. Unfortunately, as we demonstrate empirically, such problematic situations of correlated and linearly dependent variables often exist in genomic datasets and lead to under-performance of classical methods of variable selection. Results To address these challenges, we propose the Precision Lasso. Precision Lasso is a Lasso variant that promotes sparse variable selection by regularization governed by the covariance and inverse covariance matrices of explanatory variables. We illustrate its capacity for stable and consistent variable selection in simulated data with highly correlated and linearly dependent variables. We then demonstrate the effectiveness of the Precision Lasso to select meaningful variables from transcriptomic profiles of breast cancer patients. Our results indicate that in settings with correlated and linearly dependent variables, the Precision Lasso outperforms popular methods of variable selection such as the Lasso, the Elastic Net and Minimax Concave Penalty (MCP) regression.
UR - http://www.scopus.com/inward/record.url?scp=85063296081&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85063296081&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/bty750
DO - 10.1093/bioinformatics/bty750
M3 - Article
C2 - 30184048
AN - SCOPUS:85063296081
SN - 1367-4803
VL - 35
SP - 1181
EP - 1187
JO - Bioinformatics
JF - Bioinformatics
IS - 7
ER -