TY - GEN
T1 - FeaFiner
T2 - 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2013
AU - Zhou, Jiayu
AU - Lu, Zhaosong
AU - Sun, Jimeng
AU - Yuan, Lei
AU - Wang, Fei
AU - Ye, Jieping
N1 - Publisher Copyright:
Copyright © 2013 ACM.
PY - 2013/8/11
Y1 - 2013/8/11
N2 - Traditionally, feature construction and feature selection are two important but separate processes in data mining. However, many real world applications require an integrated approach for creating, refining and selecting features. To address this problem, we propose FeaFiner (short for Feature Refiner), an efficient formulation that simultaneously generalizes low-level features into higher level concepts and then selects relevant concepts based on the target variable. Specifically, we formulate a double sparsity optimization problem that identifies groups in the low-level features, generalizes higher level features using the groups and performs feature selection. Since in many clinical researches nonoverlapping groups are preferred for better interpretability, we further improve the formulation to generalize features using mutually exclusive feature groups. The proposed formulation is challenging to solve due to the orthogonality constraints, non-convexity objective and non-smoothness penalties. We apply a recently developed augmented Lagrangian method to solve this formulation in which each subproblem is solved by a non-monotone spectral projected gradient method. Our numerical experiments show that this approach is computationally efficient and also capable of producing solutions of high quality. We also present a generalization bound showing the consistency and the asymptotic behavior of the learning process of our proposed formulation. Finally, the proposed FeaFiner method is validated on Alzheimer's Disease Neuroimaging Initiative dataset, where low-level biomarkers are automatically generalized into robust higher level concepts which are then selected for predicting the disease status measured by Mini Mental State Examination and Alzheimer's Disease Assessment Scale cognitive subscore. Compared to existing predictive modeling methods, FeaFiner provides intuitive and robust feature concepts and competitive predictive accuracy.
AB - Traditionally, feature construction and feature selection are two important but separate processes in data mining. However, many real world applications require an integrated approach for creating, refining and selecting features. To address this problem, we propose FeaFiner (short for Feature Refiner), an efficient formulation that simultaneously generalizes low-level features into higher level concepts and then selects relevant concepts based on the target variable. Specifically, we formulate a double sparsity optimization problem that identifies groups in the low-level features, generalizes higher level features using the groups and performs feature selection. Since in many clinical researches nonoverlapping groups are preferred for better interpretability, we further improve the formulation to generalize features using mutually exclusive feature groups. The proposed formulation is challenging to solve due to the orthogonality constraints, non-convexity objective and non-smoothness penalties. We apply a recently developed augmented Lagrangian method to solve this formulation in which each subproblem is solved by a non-monotone spectral projected gradient method. Our numerical experiments show that this approach is computationally efficient and also capable of producing solutions of high quality. We also present a generalization bound showing the consistency and the asymptotic behavior of the learning process of our proposed formulation. Finally, the proposed FeaFiner method is validated on Alzheimer's Disease Neuroimaging Initiative dataset, where low-level biomarkers are automatically generalized into robust higher level concepts which are then selected for predicting the disease status measured by Mini Mental State Examination and Alzheimer's Disease Assessment Scale cognitive subscore. Compared to existing predictive modeling methods, FeaFiner provides intuitive and robust feature concepts and competitive predictive accuracy.
KW - Augmented lagrangian
KW - Biomarkers
KW - Feature generalization
KW - Feature selection
KW - Sparse learning
KW - Spectral gradient descent
UR - http://www.scopus.com/inward/record.url?scp=85002121785&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85002121785&partnerID=8YFLogxK
U2 - 10.1145/2487575.2487671
DO - 10.1145/2487575.2487671
M3 - Conference contribution
AN - SCOPUS:85002121785
T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
SP - 1034
EP - 1042
BT - KDD 2013 - 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
A2 - Parekh, Rajesh
A2 - He, Jingrui
A2 - Inderjit, Dhillon S.
A2 - Bradley, Paul
A2 - Koren, Yehuda
A2 - Ghani, Rayid
A2 - Senator, Ted E.
A2 - Grossman, Robert L.
A2 - Uthurusamy, Ramasamy
PB - Association for Computing Machinery
Y2 - 11 August 2013 through 14 August 2013
ER -