Understanding evolution of research themes: A probabilistic generative model for citations

Xiaolong Wang, Chengxiang Zhai, Dan Roth

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Understanding how research themes evolve over time in a research community is useful in many ways (e.g., revealing important milestones and discovering emerging major research trends). In this paper, we propose a novel way of analyzing literature citation to explore the research topics and the theme evolution by modeling article citation relations with a probabilistic generative model. The key idea is to represent a research paper by a "bag of citations" and model such a "citation document" with a probabilistic topic model. We explore the extension of a particular topic model, i.e., Latent Dirichlet Allocation (LDA), for citation analysis, and show that such a Citation-LDA can facilitate discovering of individual research topics as well as the theme evolution from multiple related topics, both of which in turn lead to the construction of evolution graphs for characterizing research themes. We test the proposed citation-LDA on two datasets: The ACL Anthology Network (AAN) of natural language research literatures and PubMed Central (PMC) archive of biomedical and life sciences literatures, and demonstrate that Citation-LDAcan effectively discover the evolution of research themes, with better formed topics than (conventional) Content-LDA.

Original languageEnglish (US)
Title of host publicationKDD 2013 - 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
EditorsRajesh Parekh, Jingrui He, Dhillon S. Inderjit, Paul Bradley, Yehuda Koren, Rayid Ghani, Ted E. Senator, Robert L. Grossman, Ramasamy Uthurusamy
PublisherAssociation for Computing Machinery
Pages1115-1123
Number of pages9
ISBN (Electronic)9781450321747
DOIs
StatePublished - Aug 11 2013
Event19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2013 - Chicago, United States
Duration: Aug 11 2013Aug 14 2013

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
VolumePart F128815

Other

Other19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2013
Country/TerritoryUnited States
CityChicago
Period8/11/138/14/13

Keywords

  • Citation analysis
  • Theme evolution

ASJC Scopus subject areas

  • Software
  • Information Systems

Cite this