Abstract
A fundamental problem of multidimensional text database analysis is efficient and effective support of various kinds of online applications, such as summarizing the content of a text cell or comparing the contents across multiple text cells. In this paper, we propose a new infrastructure called MicroTextCluster Cube (or MiTexCube) to support efficient online text analysis on multidimensional text databases by introducing micro-clusters of text documents as a compact representation of text content. Experimental results on real multidimensional text databases show that (i) MiTexCube can be materialized efficiently with reasonable overhead in space, and (ii) applications based on the proposed materialized MiTexCube are more efficient than the baseline method of direct analysis based on document units in each cell, without sacrificing much quality of analysis, and MiTexCube naturally accommodates flexible trade-off between efficiency and quality of analysis.
Original language | English (US) |
---|---|
Pages (from-to) | 243-259 |
Number of pages | 17 |
Journal | Statistical Analysis and Data Mining |
Volume | 6 |
Issue number | 3 |
DOIs | |
State | Published - Jun 2013 |
Keywords
- MiTexCube
- Multidimensional text database
- Text mining
ASJC Scopus subject areas
- Analysis
- Information Systems
- Computer Science Applications