TY - GEN
T1 - ETM
T2 - 12th IEEE International Conference on Data Mining, ICDM 2012
AU - Kim, Hyungsul
AU - Sun, Yizhou
AU - Hockenmaier, Julia
AU - Han, Jiawei
PY - 2012
Y1 - 2012
N2 - Topic models, which factor each document into different topics and represent each topic as a distribution of terms, have been widely and successfully used to better understand collections of text documents. However, documents are also associated with further information, such as the set of real-world entities mentioned in them. For example, news articles are usually related to several people, organizations, countries or locations. Since those associated entities carry rich information, it is highly desirable to build more expressive, entity-based topic models, which can capture the term distributions for each topic, each entity, as well as each topic-entity pair. In this paper, we therefore introduce a novel Entity Topic Model (ETM) for documents that are associated with a set of entities. ETM not only models the generative process of a term given its topic and entity information, but also models the correlation of entity term distributions and topic term distributions. A Gibbs sampling-based algorithm is proposed to learn the model. Experiments on real datasets demonstrate the effectiveness of our approach over several state-of-the-art baselines.
AB - Topic models, which factor each document into different topics and represent each topic as a distribution of terms, have been widely and successfully used to better understand collections of text documents. However, documents are also associated with further information, such as the set of real-world entities mentioned in them. For example, news articles are usually related to several people, organizations, countries or locations. Since those associated entities carry rich information, it is highly desirable to build more expressive, entity-based topic models, which can capture the term distributions for each topic, each entity, as well as each topic-entity pair. In this paper, we therefore introduce a novel Entity Topic Model (ETM) for documents that are associated with a set of entities. ETM not only models the generative process of a term given its topic and entity information, but also models the correlation of entity term distributions and topic term distributions. A Gibbs sampling-based algorithm is proposed to learn the model. Experiments on real datasets demonstrate the effectiveness of our approach over several state-of-the-art baselines.
KW - Data mining
KW - Entity
KW - Topic models
UR - http://www.scopus.com/inward/record.url?scp=84874041058&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84874041058&partnerID=8YFLogxK
U2 - 10.1109/ICDM.2012.107
DO - 10.1109/ICDM.2012.107
M3 - Conference contribution
AN - SCOPUS:84874041058
SN - 9780769549057
T3 - Proceedings - IEEE International Conference on Data Mining, ICDM
SP - 349
EP - 358
BT - Proceedings - 12th IEEE International Conference on Data Mining, ICDM 2012
Y2 - 10 December 2012 through 13 December 2012
ER -