TY - GEN
T1 - Towards feature selection in network
AU - Gu, Quanquan
AU - Han, Jiawei
PY - 2011/12/13
Y1 - 2011/12/13
N2 - Traditional feature selection methods assume that the data are independent and identically distributed (i.i.d.). However, in real world, there are tremendous amount of data which are distributing in a network. Existing features selection methods are not suited for networked data because the i.i.d. assumption no longer holds. This motivates us to study feature selection in a network. In this paper, we present a supervised feature selection method based on Laplacian Regularized Least Squares (LapRLS) for networked data. In detail, we use linear regression to utilize the content information, and adopt graph regularization to consider the link information. The proposed feature selection method aims at selecting a subset of features such that the empirical error of LapRLS is minimized. The resultant optimization problem is a mixed integer programming, which is difficult to solve. It is relaxed into a L 2,1-norm constrained LapRLS problem and solved by accelerated proximal gradient descent algorithm. Experiments on benchmark networked data sets show that the proposed feature selection method outperforms traditional feature selection method and the state of the art learning in network approaches.
AB - Traditional feature selection methods assume that the data are independent and identically distributed (i.i.d.). However, in real world, there are tremendous amount of data which are distributing in a network. Existing features selection methods are not suited for networked data because the i.i.d. assumption no longer holds. This motivates us to study feature selection in a network. In this paper, we present a supervised feature selection method based on Laplacian Regularized Least Squares (LapRLS) for networked data. In detail, we use linear regression to utilize the content information, and adopt graph regularization to consider the link information. The proposed feature selection method aims at selecting a subset of features such that the empirical error of LapRLS is minimized. The resultant optimization problem is a mixed integer programming, which is difficult to solve. It is relaxed into a L 2,1-norm constrained LapRLS problem and solved by accelerated proximal gradient descent algorithm. Experiments on benchmark networked data sets show that the proposed feature selection method outperforms traditional feature selection method and the state of the art learning in network approaches.
KW - Laplacian regularized least squares
KW - feature selection
KW - graph regularization
KW - network
UR - http://www.scopus.com/inward/record.url?scp=83055161693&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=83055161693&partnerID=8YFLogxK
U2 - 10.1145/2063576.2063746
DO - 10.1145/2063576.2063746
M3 - Conference contribution
AN - SCOPUS:83055161693
SN - 9781450307178
T3 - International Conference on Information and Knowledge Management, Proceedings
SP - 1175
EP - 1184
BT - CIKM'11 - Proceedings of the 2011 ACM International Conference on Information and Knowledge Management
T2 - 20th ACM Conference on Information and Knowledge Management, CIKM'11
Y2 - 24 October 2011 through 28 October 2011
ER -