Document clustering using locality preserving indexing

Deng Cai, Xiaofei He, Jiawei Han

Research output: Contribution to journalArticlepeer-review

Abstract

We propose a novel document clustering method which aims to cluster the documents into different semantic classes. The document space is generally of high dimensionality and clustering in such a high dimensional space is often infeasible due to the curse of dimensionality. By using Locality Preserving Indexing (LPI), the documents can be projected into a lower-dimensional semantic space in which the documents related to the same semantics are close to each other. Different from previous document clustering methods based on Latent Semantic Indexing (LSI) or Nonnegative Matrix Factorization (NMF), our method tries to discover both the geometric and discriminating structures of the document space. Theoretical analysis of our method shows that LPI is an unsupervised approximation of the supervised Linear Discriminant Analysis (LDA) method, which gives the intuitive motivation of our method. Extensive experimental evaluations are performed on the Reuters-21578 and TDT2 data sets.

Original languageEnglish (US)
Pages (from-to)1624-1637
Number of pages14
JournalIEEE Transactions on Knowledge and Data Engineering
Volume17
Issue number12
DOIs
StatePublished - Dec 2005

Keywords

  • Dimensionality reduction
  • Document clustering
  • Locality preserving indexing
  • Semantics

ASJC Scopus subject areas

  • Information Systems
  • Computer Science Applications
  • Computational Theory and Mathematics

Fingerprint

Dive into the research topics of 'Document clustering using locality preserving indexing'. Together they form a unique fingerprint.

Cite this