Topic cube: Topic modeling for OLAP on multidimensional text databases

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

As the amount of textual information grows explosively in various kinds of business systems, it becomes more and more desirable to analyze both structured data records and unstructured text data simultaneously. While online analytical processing (OLAP) techniques have been proven very useful for analyzing and mining structured data, they face challenges in handling text data. On the other hand, probabilistic topic models are among the most effective approaches to latent topic analysis and mining on text data. In this paper, we propose a new data model called topic cube to combine OLAP with probabilistic topic modeling and enable OLAP on the dimension of text data in a multidimensional text database. Topic cube extends the traditional data cube to cope with a topic hierarchy and store probabilistic content measures of text documents learned through a probabilistic topic model. To materialize topic cubes efficiently, we propose a heuristic method to speed up the iterative EM algorithm for estimating topic models by leveraging the models learned on component data cells to choose a good starting point for iteration. Experiment results show that this heuristic method is much faster than the baseline method of computing each topic cube from scratch. We also discuss potential uses of topic cube and show sample experimental results.

Original languageEnglish (US)
Title of host publicationSociety for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics 133
Pages1117-1128
Number of pages12
StatePublished - Dec 1 2009
Event9th SIAM International Conference on Data Mining 2009, SDM 2009 - Sparks, NV, United States
Duration: Apr 30 2009May 2 2009

Publication series

NameSociety for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics
Volume3

Other

Other9th SIAM International Conference on Data Mining 2009, SDM 2009
CountryUnited States
CitySparks, NV
Period4/30/095/2/09

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Software
  • Applied Mathematics

Fingerprint Dive into the research topics of 'Topic cube: Topic modeling for OLAP on multidimensional text databases'. Together they form a unique fingerprint.

Cite this