MiTexCube: MicroTextCluster Cube for online analysis of text cells and its applications

Research output: Contribution to journalArticlepeer-review

Abstract

A fundamental problem of multidimensional text database analysis is efficient and effective support of various kinds of online applications, such as summarizing the content of a text cell or comparing the contents across multiple text cells. In this paper, we propose a new infrastructure called MicroTextCluster Cube (or MiTexCube) to support efficient online text analysis on multidimensional text databases by introducing micro-clusters of text documents as a compact representation of text content. Experimental results on real multidimensional text databases show that (i) MiTexCube can be materialized efficiently with reasonable overhead in space, and (ii) applications based on the proposed materialized MiTexCube are more efficient than the baseline method of direct analysis based on document units in each cell, without sacrificing much quality of analysis, and MiTexCube naturally accommodates flexible trade-off between efficiency and quality of analysis.

Original languageEnglish (US)
Pages (from-to)243-259
Number of pages17
JournalStatistical Analysis and Data Mining
Volume6
Issue number3
DOIs
StatePublished - Jun 1 2013

Keywords

  • MiTexCube
  • Multidimensional text database
  • Text mining

ASJC Scopus subject areas

  • Analysis
  • Information Systems
  • Computer Science Applications

Fingerprint Dive into the research topics of 'MiTexCube: MicroTextCluster Cube for online analysis of text cells and its applications'. Together they form a unique fingerprint.

Cite this