Unsupervised feature selection for multi-view clustering on text-image web news data

Mingjie Qian, Chengxiang Zhai

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Unlabeled high-dimensional text-image web news data are produced every day, presenting new challenges to unsuper-vised feature selection on multi-view data. State-of-the-art multi-view unsupervised feature selection methods learn pseudo class labels by spectral analysis, which is sensitive to the choice of similarity metric for each view. For textimage data, the raw text itself contains more discriminative information than similarity graph which loses information during construction, and thus the text feature can be directly used for label learning, avoiding information loss as in spectral analysis. We propose a new multi-view unsupervised feature selection method in which image local learning regularized orthogonal nonnegative matrix factorization is used to learn pseudo labels and simultaneously robust joint l2,1-norm minimization is performed to select discriminative features. Cross-view consensus on pseudo labels can be obtained as much as possible. We systematically evaluate the proposed method in multi-view textimage web news datasets. Our extensive experiments on web news datasets crawled from two major US media channels: CNN and FOXNews demonstrate the efficacy of the new method over state-of-the-art multi-view and single-view unsupervised feature selection methods.

Original languageEnglish (US)
Title of host publicationCIKM 2014 - Proceedings of the 2014 ACM International Conference on Information and Knowledge Management
PublisherAssociation for Computing Machinery, Inc
Pages1963-1966
Number of pages4
ISBN (Electronic)9781450325981
DOIs
StatePublished - Nov 3 2014
Event23rd ACM International Conference on Information and Knowledge Management, CIKM 2014 - Shanghai, China
Duration: Nov 3 2014Nov 7 2014

Publication series

NameCIKM 2014 - Proceedings of the 2014 ACM International Conference on Information and Knowledge Management

Other

Other23rd ACM International Conference on Information and Knowledge Management, CIKM 2014
Country/TerritoryChina
CityShanghai
Period11/3/1411/7/14

Keywords

  • Multi-view unsupervised feature selection

ASJC Scopus subject areas

  • Information Systems and Management
  • Computer Science Applications
  • Information Systems

Fingerprint

Dive into the research topics of 'Unsupervised feature selection for multi-view clustering on text-image web news data'. Together they form a unique fingerprint.

Cite this