TY - GEN
T1 - Unsupervised feature selection for multi-view clustering on text-image web news data
AU - Qian, Mingjie
AU - Zhai, Chengxiang
N1 - Publisher Copyright:
Copyright 2014 ACM.
PY - 2014/11/3
Y1 - 2014/11/3
N2 - Unlabeled high-dimensional text-image web news data are produced every day, presenting new challenges to unsuper-vised feature selection on multi-view data. State-of-the-art multi-view unsupervised feature selection methods learn pseudo class labels by spectral analysis, which is sensitive to the choice of similarity metric for each view. For textimage data, the raw text itself contains more discriminative information than similarity graph which loses information during construction, and thus the text feature can be directly used for label learning, avoiding information loss as in spectral analysis. We propose a new multi-view unsupervised feature selection method in which image local learning regularized orthogonal nonnegative matrix factorization is used to learn pseudo labels and simultaneously robust joint l2,1-norm minimization is performed to select discriminative features. Cross-view consensus on pseudo labels can be obtained as much as possible. We systematically evaluate the proposed method in multi-view textimage web news datasets. Our extensive experiments on web news datasets crawled from two major US media channels: CNN and FOXNews demonstrate the efficacy of the new method over state-of-the-art multi-view and single-view unsupervised feature selection methods.
AB - Unlabeled high-dimensional text-image web news data are produced every day, presenting new challenges to unsuper-vised feature selection on multi-view data. State-of-the-art multi-view unsupervised feature selection methods learn pseudo class labels by spectral analysis, which is sensitive to the choice of similarity metric for each view. For textimage data, the raw text itself contains more discriminative information than similarity graph which loses information during construction, and thus the text feature can be directly used for label learning, avoiding information loss as in spectral analysis. We propose a new multi-view unsupervised feature selection method in which image local learning regularized orthogonal nonnegative matrix factorization is used to learn pseudo labels and simultaneously robust joint l2,1-norm minimization is performed to select discriminative features. Cross-view consensus on pseudo labels can be obtained as much as possible. We systematically evaluate the proposed method in multi-view textimage web news datasets. Our extensive experiments on web news datasets crawled from two major US media channels: CNN and FOXNews demonstrate the efficacy of the new method over state-of-the-art multi-view and single-view unsupervised feature selection methods.
KW - Multi-view unsupervised feature selection
UR - https://www.scopus.com/pages/publications/84937622297
UR - https://www.scopus.com/pages/publications/84937622297#tab=citedBy
U2 - 10.1145/2661829.2661993
DO - 10.1145/2661829.2661993
M3 - Conference contribution
AN - SCOPUS:84937622297
T3 - CIKM 2014 - Proceedings of the 2014 ACM International Conference on Information and Knowledge Management
SP - 1963
EP - 1966
BT - CIKM 2014 - Proceedings of the 2014 ACM International Conference on Information and Knowledge Management
PB - Association for Computing Machinery
T2 - 23rd ACM International Conference on Information and Knowledge Management, CIKM 2014
Y2 - 3 November 2014 through 7 November 2014
ER -