Revisiting Citation Prediction with Cluster-Aware Text-Enhanced Heterogeneous Graph Neural Networks

Carl Yang, Jiawei Han

Research output: Chapter in Book/Report/Conference proceedingConference contribution


Numerous papers get published all the time. However, some papers are born to be well-cited while others are not. In this work, we revisit the important problem of citation prediction, by focusing on the important yet realistic prediction on the average number of citations a paper will attract per year. The task is nonetheless challenging because many correlated factors underlie the potential impact of a paper, such as the prestige of its authors, the authority of its publishing venue, and the significance of the problems/techniques/applications it studies. To jointly model these factors, we propose to construct a heterogeneous publication network of nodes including papers, authors, venues, and terms. Moreover, we devise a novel heterogeneous graph neural network (HGN) to jointly embed all types of nodes and links, towards the modeling of research impact and its propagation. Beyond graph heterogeneity, we find it also important to consider the latent research domains, because the same nodes can have different impacts within different communities. Therefore, we further devise a novel cluster-aware (CA) module, which models all nodes and their interactions under the proper contexts of research domains. Finally, to exploit the information-rich texts associated with papers, we devise a novel text-enhancing (TE) module for automatic quality term mining. With the real-world publication data of DBLP, we construct three different networks and conduct comprehensive experiments to evaluate our proposed CATE-HGN framework, against various state-of-the-art models. Rich quantitative results and qualitative case studies demonstrate the superiority of CATE-HGN in citation prediction on publication networks, and indicate its general advantages in various relevant downstream tasks on text-rich heterogeneous networks.

Original languageEnglish (US)
Title of host publicationProceedings - 2023 IEEE 39th International Conference on Data Engineering, ICDE 2023
PublisherIEEE Computer Society
Number of pages14
ISBN (Electronic)9798350322279
StatePublished - 2023
Event39th IEEE International Conference on Data Engineering, ICDE 2023 - Anaheim, United States
Duration: Apr 3 2023Apr 7 2023

Publication series

NameProceedings - International Conference on Data Engineering
ISSN (Print)1084-4627


Conference39th IEEE International Conference on Data Engineering, ICDE 2023
Country/TerritoryUnited States


  • conditional network embedding
  • generative adversarial networks
  • hierarchical network embedding

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Information Systems


Dive into the research topics of 'Revisiting Citation Prediction with Cluster-Aware Text-Enhanced Heterogeneous Graph Neural Networks'. Together they form a unique fingerprint.

Cite this