ETM: Entity topic models for mining documents associated with entities

Hyungsul Kim, Yizhou Sun, Julia Hockenmaier, Jiawei Han

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Topic models, which factor each document into different topics and represent each topic as a distribution of terms, have been widely and successfully used to better understand collections of text documents. However, documents are also associated with further information, such as the set of real-world entities mentioned in them. For example, news articles are usually related to several people, organizations, countries or locations. Since those associated entities carry rich information, it is highly desirable to build more expressive, entity-based topic models, which can capture the term distributions for each topic, each entity, as well as each topic-entity pair. In this paper, we therefore introduce a novel Entity Topic Model (ETM) for documents that are associated with a set of entities. ETM not only models the generative process of a term given its topic and entity information, but also models the correlation of entity term distributions and topic term distributions. A Gibbs sampling-based algorithm is proposed to learn the model. Experiments on real datasets demonstrate the effectiveness of our approach over several state-of-the-art baselines.

Original languageEnglish (US)
Title of host publicationProceedings - 12th IEEE International Conference on Data Mining, ICDM 2012
Pages349-358
Number of pages10
DOIs
StatePublished - 2012
Event12th IEEE International Conference on Data Mining, ICDM 2012 - Brussels, Belgium
Duration: Dec 10 2012Dec 13 2012

Publication series

NameProceedings - IEEE International Conference on Data Mining, ICDM
ISSN (Print)1550-4786

Other

Other12th IEEE International Conference on Data Mining, ICDM 2012
Country/TerritoryBelgium
CityBrussels
Period12/10/1212/13/12

Keywords

  • Data mining
  • Entity
  • Topic models

ASJC Scopus subject areas

  • General Engineering

Fingerprint

Dive into the research topics of 'ETM: Entity topic models for mining documents associated with entities'. Together they form a unique fingerprint.

Cite this