Tensor Space Model for document analysis

Deng Cai, Xiaofei He, Jiawei Han

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Vector Space Model (VSM) has been at the core of information retrieval for the past decades. VSM considers the documents as vectors in high dimensional space. In such a vector space, techniques like Latent Semantic Indexing (LSI), Support Vector Machines (SVM), Naive Bayes, etc., can be then applied for indexing and classification. However, in some cases, the dimensionality of the document space might be extremely large, which makes these techniques infeasible due to the curse of dimensionality. In this paper, we propose a novel Tensor Space Model for document analysis. We represent documents as the second order tensors, or matrices. Correspondingly, a novel indexing algorithm called Tensor Latent Semantic Indexing (TensorLSI) is developed in the tensor space. Our theoretical analysis shows that TensorLSI is much more computationally efficient than the conventional Latent Semantic Indexing, which makes it applicable for extremely large scale data set. Several experimental results on standard document data sets demonstrate the efficiency and effectiveness of our algorithm.

Original languageEnglish (US)
Title of host publicationProceedings of the Twenty-Ninth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
PublisherAssociation for Computing Machinery
Pages625-626
Number of pages2
ISBN (Print)1595933697, 9781595933690
DOIs
StatePublished - 2006
Event29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval - Seatttle, WA, United States
Duration: Aug 6 2006Aug 11 2006

Publication series

NameProceedings of the Twenty-Ninth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
Volume2006

Other

Other29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
Country/TerritoryUnited States
CitySeatttle, WA
Period8/6/068/11/06

Keywords

  • Latent semantic indexing
  • Tensor latent semantic indexing
  • Tensor space model
  • Vector space model

ASJC Scopus subject areas

  • Engineering(all)
  • Information Systems
  • Software
  • Applied Mathematics

Fingerprint

Dive into the research topics of 'Tensor Space Model for document analysis'. Together they form a unique fingerprint.

Cite this