TY - GEN
T1 - Patient risk prediction model via top-κ stability selection
AU - Zhou, Jiayu
AU - Sun, Jimeng
AU - Liu, Yashu
AU - Hu, Jianying
AU - Ye, Jieping
N1 - Publisher Copyright:
Copyright © SIAM.
PY - 2013/1/1
Y1 - 2013/1/1
N2 - The patient risk prediction model aims at assessing the risk of a patient in developing a target disease based on his/her health profile. As electronic health records (EHRs) become more prevalent, a large number of features can be constructed in order to characterize patient profiles. This wealth of data provides unprecedented opportunities for data mining researchers to address important biomedical questions. Practical data mining challenges include: How to correctly select and rank those features based on their prediction power? What predictive model performs the best in predicting a target disease using those features? In this paper, we propose top-κ stability selection, which generalizes a powerful sparse learning method for feature selection by overcoming its limitation on parameter selection. In particular, our proposed top-κ stability selection includes the original stability selection method as a special case given κ = 1. Moreover, we show that the top-κ stability selection is more robust by utilizing more information from selection probabilities than the original stability selection, and provides stronger theoretical properties. In a large set of real clinical prediction datasets, the top-κ stability selection methods outperform many existing feature selection methods including the original stability selection. We also compare three competitive classification methods (SVM, logistic regression and random forest) to demonstrate the effectiveness of selected features by our proposed method in the context of clinical prediction applications. Finally, through several clinical applications on predicting heart failure related symptoms, we show that top-κ stability selection can successfully identify important features that are clinically meaningful.
AB - The patient risk prediction model aims at assessing the risk of a patient in developing a target disease based on his/her health profile. As electronic health records (EHRs) become more prevalent, a large number of features can be constructed in order to characterize patient profiles. This wealth of data provides unprecedented opportunities for data mining researchers to address important biomedical questions. Practical data mining challenges include: How to correctly select and rank those features based on their prediction power? What predictive model performs the best in predicting a target disease using those features? In this paper, we propose top-κ stability selection, which generalizes a powerful sparse learning method for feature selection by overcoming its limitation on parameter selection. In particular, our proposed top-κ stability selection includes the original stability selection method as a special case given κ = 1. Moreover, we show that the top-κ stability selection is more robust by utilizing more information from selection probabilities than the original stability selection, and provides stronger theoretical properties. In a large set of real clinical prediction datasets, the top-κ stability selection methods outperform many existing feature selection methods including the original stability selection. We also compare three competitive classification methods (SVM, logistic regression and random forest) to demonstrate the effectiveness of selected features by our proposed method in the context of clinical prediction applications. Finally, through several clinical applications on predicting heart failure related symptoms, we show that top-κ stability selection can successfully identify important features that are clinically meaningful.
UR - http://www.scopus.com/inward/record.url?scp=84960127630&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84960127630&partnerID=8YFLogxK
U2 - 10.1137/1.9781611972832.7
DO - 10.1137/1.9781611972832.7
M3 - Conference contribution
AN - SCOPUS:84960127630
T3 - Proceedings of the 2013 SIAM International Conference on Data Mining, SDM 2013
SP - 55
EP - 63
BT - Proceedings of the 2013 SIAM International Conference on Data Mining, SDM 2013
A2 - Parthasarathy, Srinivasan
A2 - Ghosh, Joydeep
A2 - Zhou, Zhi-Hua
A2 - Dy, Jennifer
A2 - Obradovic, Zoran
A2 - Kamath, Chandrika
PB - Siam Society
T2 - SIAM International Conference on Data Mining, SDM 2013
Y2 - 2 May 2013 through 4 May 2013
ER -