TY - GEN
T1 - Train Your Own GNN Teacher
T2 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2023
AU - Mavromatis, Costas
AU - Ioannidis, Vassilis N.
AU - Wang, Shen
AU - Zheng, Da
AU - Adeshina, Soji
AU - Ma, Jun
AU - Zhao, Han
AU - Faloutsos, Christos
AU - Karypis, George
N1 - Publisher Copyright:
© 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2023
Y1 - 2023
N2 - How can we learn effective node representations on textual graphs? Graph Neural Networks (GNNs) that use Language Models (LMs) to encode textual information of graphs achieve state-of-the-art performance in many node classification tasks. Yet, combining GNNs with LMs has not been widely explored for practical deployments due to its scalability issues. In this work, we tackle this challenge by developing a Graph-Aware Distillation framework (GraD) to encode graph structures into an LM for graph-free, fast inference. Different from conventional knowledge distillation, GraD jointly optimizes a GNN teacher and a graph-free student over the graph’s nodes via a shared LM. This encourages the graph-free student to exploit graph information encoded by the GNN teacher while at the same time, enables the GNN teacher to better leverage textual information from unlabeled nodes. As a result, the teacher and the student models learn from each other to improve their overall performance. Experiments in eight node classification benchmarks in both transductive and inductive settings showcase GraD ’s superiority over existing distillation approaches for textual graphs. Our code and supplementary material are available at: https://github.com/cmavro/GRAD.
AB - How can we learn effective node representations on textual graphs? Graph Neural Networks (GNNs) that use Language Models (LMs) to encode textual information of graphs achieve state-of-the-art performance in many node classification tasks. Yet, combining GNNs with LMs has not been widely explored for practical deployments due to its scalability issues. In this work, we tackle this challenge by developing a Graph-Aware Distillation framework (GraD) to encode graph structures into an LM for graph-free, fast inference. Different from conventional knowledge distillation, GraD jointly optimizes a GNN teacher and a graph-free student over the graph’s nodes via a shared LM. This encourages the graph-free student to exploit graph information encoded by the GNN teacher while at the same time, enables the GNN teacher to better leverage textual information from unlabeled nodes. As a result, the teacher and the student models learn from each other to improve their overall performance. Experiments in eight node classification benchmarks in both transductive and inductive settings showcase GraD ’s superiority over existing distillation approaches for textual graphs. Our code and supplementary material are available at: https://github.com/cmavro/GRAD.
KW - Graph Neural Networks
KW - Knowledge Distillation
KW - Language Models
UR - http://www.scopus.com/inward/record.url?scp=85174443462&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85174443462&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-43418-1_10
DO - 10.1007/978-3-031-43418-1_10
M3 - Conference contribution
AN - SCOPUS:85174443462
SN - 9783031434174
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 157
EP - 173
BT - Machine Learning and Knowledge Discovery in Databases
A2 - Koutra, Danai
A2 - Plant, Claudia
A2 - Gomez Rodriguez, Manuel
A2 - Baralis, Elena
A2 - Bonchi, Francesco
PB - Springer
Y2 - 18 September 2023 through 22 September 2023
ER -