Scalable moment-based inference for latent Dirichlet allocation

Chi Wang, Xueqing Liu, Yanglei Song, Jiawei Han

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Topic models such as Latent Dirichlet Allocation have been useful text analysis methods of wide interest. Recently, moment-based inference with provable performance has been proposed for topic models. Compared with inference algorithms that approximate the maximum likelihood objective, moment-based inference has theoretical guarantee in recovering model parameters. One such inference method is tensor orthogonal decomposition, which requires only mild assumptions for exact recovery of topics. However, it suffers from scalability issue due to creation of dense, high-dimensional tensors. In this work, we propose a speedup technique by leveraging the special structure of the tensors. It is efficient in both time and space, and only requires scanning the corpus twice. It improves over the state-of-the-art inference algorithm by one to three orders of magnitude, while preserving equal inference ability.

Original languageEnglish (US)
Title of host publicationMachine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2014, Proceedings
PublisherSpringer
Pages290-305
Number of pages16
EditionPART 3
ISBN (Print)9783662448441
DOIs
StatePublished - 2014
EventEuropean Conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD 2014 - Nancy, France
Duration: Sep 15 2014Sep 19 2014

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
NumberPART 3
Volume8726 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

OtherEuropean Conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD 2014
Country/TerritoryFrance
CityNancy
Period9/15/149/19/14

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Scalable moment-based inference for latent Dirichlet allocation'. Together they form a unique fingerprint.

Cite this