TY - GEN
T1 - Understanding evolution of research themes
T2 - 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2013
AU - Wang, Xiaolong
AU - Zhai, Chengxiang
AU - Roth, Dan
N1 - Funding Information:
Our work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number D11PC20155. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, DoI/NBC, or the U.S. Government.
Publisher Copyright:
Copyright © 2013 ACM.
PY - 2013/8/11
Y1 - 2013/8/11
N2 - Understanding how research themes evolve over time in a research community is useful in many ways (e.g., revealing important milestones and discovering emerging major research trends). In this paper, we propose a novel way of analyzing literature citation to explore the research topics and the theme evolution by modeling article citation relations with a probabilistic generative model. The key idea is to represent a research paper by a "bag of citations" and model such a "citation document" with a probabilistic topic model. We explore the extension of a particular topic model, i.e., Latent Dirichlet Allocation (LDA), for citation analysis, and show that such a Citation-LDA can facilitate discovering of individual research topics as well as the theme evolution from multiple related topics, both of which in turn lead to the construction of evolution graphs for characterizing research themes. We test the proposed citation-LDA on two datasets: The ACL Anthology Network (AAN) of natural language research literatures and PubMed Central (PMC) archive of biomedical and life sciences literatures, and demonstrate that Citation-LDAcan effectively discover the evolution of research themes, with better formed topics than (conventional) Content-LDA.
AB - Understanding how research themes evolve over time in a research community is useful in many ways (e.g., revealing important milestones and discovering emerging major research trends). In this paper, we propose a novel way of analyzing literature citation to explore the research topics and the theme evolution by modeling article citation relations with a probabilistic generative model. The key idea is to represent a research paper by a "bag of citations" and model such a "citation document" with a probabilistic topic model. We explore the extension of a particular topic model, i.e., Latent Dirichlet Allocation (LDA), for citation analysis, and show that such a Citation-LDA can facilitate discovering of individual research topics as well as the theme evolution from multiple related topics, both of which in turn lead to the construction of evolution graphs for characterizing research themes. We test the proposed citation-LDA on two datasets: The ACL Anthology Network (AAN) of natural language research literatures and PubMed Central (PMC) archive of biomedical and life sciences literatures, and demonstrate that Citation-LDAcan effectively discover the evolution of research themes, with better formed topics than (conventional) Content-LDA.
KW - Citation analysis
KW - Theme evolution
UR - http://www.scopus.com/inward/record.url?scp=84904553274&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84904553274&partnerID=8YFLogxK
U2 - 10.1145/2487575.2487698
DO - 10.1145/2487575.2487698
M3 - Conference contribution
AN - SCOPUS:84904553274
T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
SP - 1115
EP - 1123
BT - KDD 2013 - 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
A2 - Parekh, Rajesh
A2 - He, Jingrui
A2 - Inderjit, Dhillon S.
A2 - Bradley, Paul
A2 - Koren, Yehuda
A2 - Ghani, Rayid
A2 - Senator, Ted E.
A2 - Grossman, Robert L.
A2 - Uthurusamy, Ramasamy
PB - Association for Computing Machinery
Y2 - 11 August 2013 through 14 August 2013
ER -