Automatic labeling of multinomial topic models

Qiaozhu Mei, Xuehua Shen, Chengxiang Zhai

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Multinomial distributions over words are frequently used to model topics in text collections. A common, major challenge in applying all such topic models to any text mining problem is to label a multinomial topic model accurately so that a user can interpret the discovered topic. So far, such labels have been generated manually in a subjective way. In this paper, we propose probabilistic approaches to automatically labeling multinomial topic models in an objective way. We cast this labeling problem as an optimization problem involving minimizing Kullback-Leibler divergence between word distributions and maximizing mutual information between a label and a topic model. Experiments with user study have been done on two text data sets with different genres.The results show that the proposed labeling methods are quite effective to generate labels that are meaningful and useful for interpreting the discovered topic models. Our methods are general and can be applied to labeling topics learned through all kinds of topic models such as PLSA, LDA, and their variations.

Original languageEnglish (US)
Title of host publicationKDD-2007
Subtitle of host publicationProceedings of the Thirteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Pages490-499
Number of pages10
DOIs
StatePublished - Dec 14 2007
EventKDD-2007: 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - San Jose, CA, United States
Duration: Aug 12 2007Aug 15 2007

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Other

OtherKDD-2007: 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
CountryUnited States
CitySan Jose, CA
Period8/12/078/15/07

Keywords

  • Multinomial distribution
  • Statistical topic models
  • Topic model labeling

ASJC Scopus subject areas

  • Software
  • Information Systems

Fingerprint Dive into the research topics of 'Automatic labeling of multinomial topic models'. Together they form a unique fingerprint.

Cite this