TY - GEN
T1 - Tensor Space Model for document analysis
AU - Cai, Deng
AU - He, Xiaofei
AU - Han, Jiawei
PY - 2006
Y1 - 2006
N2 - Vector Space Model (VSM) has been at the core of information retrieval for the past decades. VSM considers the documents as vectors in high dimensional space. In such a vector space, techniques like Latent Semantic Indexing (LSI), Support Vector Machines (SVM), Naive Bayes, etc., can be then applied for indexing and classification. However, in some cases, the dimensionality of the document space might be extremely large, which makes these techniques infeasible due to the curse of dimensionality. In this paper, we propose a novel Tensor Space Model for document analysis. We represent documents as the second order tensors, or matrices. Correspondingly, a novel indexing algorithm called Tensor Latent Semantic Indexing (TensorLSI) is developed in the tensor space. Our theoretical analysis shows that TensorLSI is much more computationally efficient than the conventional Latent Semantic Indexing, which makes it applicable for extremely large scale data set. Several experimental results on standard document data sets demonstrate the efficiency and effectiveness of our algorithm.
AB - Vector Space Model (VSM) has been at the core of information retrieval for the past decades. VSM considers the documents as vectors in high dimensional space. In such a vector space, techniques like Latent Semantic Indexing (LSI), Support Vector Machines (SVM), Naive Bayes, etc., can be then applied for indexing and classification. However, in some cases, the dimensionality of the document space might be extremely large, which makes these techniques infeasible due to the curse of dimensionality. In this paper, we propose a novel Tensor Space Model for document analysis. We represent documents as the second order tensors, or matrices. Correspondingly, a novel indexing algorithm called Tensor Latent Semantic Indexing (TensorLSI) is developed in the tensor space. Our theoretical analysis shows that TensorLSI is much more computationally efficient than the conventional Latent Semantic Indexing, which makes it applicable for extremely large scale data set. Several experimental results on standard document data sets demonstrate the efficiency and effectiveness of our algorithm.
KW - Latent semantic indexing
KW - Tensor latent semantic indexing
KW - Tensor space model
KW - Vector space model
UR - http://www.scopus.com/inward/record.url?scp=33750374181&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33750374181&partnerID=8YFLogxK
U2 - 10.1145/1148170.1148287
DO - 10.1145/1148170.1148287
M3 - Conference contribution
AN - SCOPUS:33750374181
SN - 1595933697
SN - 9781595933690
T3 - Proceedings of the Twenty-Ninth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
SP - 625
EP - 626
BT - Proceedings of the Twenty-Ninth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
PB - Association for Computing Machinery
T2 - 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
Y2 - 6 August 2006 through 11 August 2006
ER -