TY - GEN
T1 - iTopicModel
T2 - 9th IEEE International Conference on Data Mining, ICDM 2009
AU - Sun, Yizhou
AU - Han, Jiawei
AU - Gao, Jing
AU - Yu, Yintao
PY - 2009
Y1 - 2009
N2 - Document networks, i.e., networks associated with text information, are becoming increasingly popular due to the ubiquity of Web documents, blogs, and various kinds of online data. In this paper, we propose a novel topic modeling framework for document networks, which builds a unified generative topic model that is able to consider both text and structure information for documents. A graphical model is proposed to describe the generative model. On the top layer of this graphical model, we define a novel multivariate Markov Random Field for topic distribution random variables for each document, to model the dependency relationships among documents over the network structure. On the bottom layer, we follow the traditional topic model to model the generation of text for each document. A joint distribution function for both the text and structure of the documents is thus provided. A solution to estimate this topic model is given, by maximizing the log-likelihood of the joint probability. Some important practical issues in real applications are also discussed, including how to decide the topic number and how to choose a good network structure. We apply the model on two real datasets, DBLP and Cora, and the experiments show that this model is more effective in comparison with the state-of-the-art topic modeling algorithms.
AB - Document networks, i.e., networks associated with text information, are becoming increasingly popular due to the ubiquity of Web documents, blogs, and various kinds of online data. In this paper, we propose a novel topic modeling framework for document networks, which builds a unified generative topic model that is able to consider both text and structure information for documents. A graphical model is proposed to describe the generative model. On the top layer of this graphical model, we define a novel multivariate Markov Random Field for topic distribution random variables for each document, to model the dependency relationships among documents over the network structure. On the bottom layer, we follow the traditional topic model to model the generation of text for each document. A joint distribution function for both the text and structure of the documents is thus provided. A solution to estimate this topic model is given, by maximizing the log-likelihood of the joint probability. Some important practical issues in real applications are also discussed, including how to decide the topic number and how to choose a good network structure. We apply the model on two real datasets, DBLP and Cora, and the experiments show that this model is more effective in comparison with the state-of-the-art topic modeling algorithms.
KW - Document networks
KW - Markov Random Field
KW - Topic model
UR - http://www.scopus.com/inward/record.url?scp=77951153812&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77951153812&partnerID=8YFLogxK
U2 - 10.1109/ICDM.2009.43
DO - 10.1109/ICDM.2009.43
M3 - Conference contribution
AN - SCOPUS:77951153812
SN - 9780769538952
T3 - Proceedings - IEEE International Conference on Data Mining, ICDM
SP - 493
EP - 502
BT - ICDM 2009 - The 9th IEEE International Conference on Data Mining
Y2 - 6 December 2009 through 9 December 2009
ER -