TY - GEN
T1 - Shift-Robust GNNs
T2 - 35th Conference on Neural Information Processing Systems, NeurIPS 2021
AU - Zhu, Qi
AU - Ponomareva, Natalia
AU - Han, Jiawei
AU - Perozzi, Bryan
N1 - Publisher Copyright:
© 2021 Neural information processing systems foundation. All rights reserved.
PY - 2021
Y1 - 2021
N2 - There has been a recent surge of interest in designing Graph Neural Networks (GNNs) for semi-supervised learning tasks. Unfortunately this work has assumed that the nodes labeled for use in training were selected uniformly at random (i.e. are an IID sample). However in many real world scenarios gathering labels for graph nodes is both expensive and inherently biased - so this assumption can not be met. GNNs can suffer poor generalization when this occurs, by overfitting to superfluous regularities present in the training data. In this work we present a method, Shift-Robust GNN (SR-GNN), designed to account for distributional differences between biased training data and a graph's true inference distribution. SR-GNN adapts GNN models to the presence of distributional shift between the nodes labeled for training and the rest of the dataset. We illustrate the effectiveness of SR-GNN in a variety of experiments with biased training datasets on common GNN benchmark datasets for semi-supervised learning, where we see that SR-GNN outperforms other GNN baselines in accuracy, addressing at least ∼40% of the negative effects introduced by biased training data. On the largest dataset we consider, ogb-arxiv, we observe a 2% absolute improvement over the baseline and are able to mitigate 30% of the negative effects from training data bias.
AB - There has been a recent surge of interest in designing Graph Neural Networks (GNNs) for semi-supervised learning tasks. Unfortunately this work has assumed that the nodes labeled for use in training were selected uniformly at random (i.e. are an IID sample). However in many real world scenarios gathering labels for graph nodes is both expensive and inherently biased - so this assumption can not be met. GNNs can suffer poor generalization when this occurs, by overfitting to superfluous regularities present in the training data. In this work we present a method, Shift-Robust GNN (SR-GNN), designed to account for distributional differences between biased training data and a graph's true inference distribution. SR-GNN adapts GNN models to the presence of distributional shift between the nodes labeled for training and the rest of the dataset. We illustrate the effectiveness of SR-GNN in a variety of experiments with biased training datasets on common GNN benchmark datasets for semi-supervised learning, where we see that SR-GNN outperforms other GNN baselines in accuracy, addressing at least ∼40% of the negative effects introduced by biased training data. On the largest dataset we consider, ogb-arxiv, we observe a 2% absolute improvement over the baseline and are able to mitigate 30% of the negative effects from training data bias.
UR - http://www.scopus.com/inward/record.url?scp=85119167066&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85119167066&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85119167066
T3 - Advances in Neural Information Processing Systems
SP - 27965
EP - 27977
BT - Advances in Neural Information Processing Systems 34 - 35th Conference on Neural Information Processing Systems, NeurIPS 2021
A2 - Ranzato, Marc'Aurelio
A2 - Beygelzimer, Alina
A2 - Dauphin, Yann
A2 - Liang, Percy S.
A2 - Wortman Vaughan, Jenn
PB - Neural information processing systems foundation
Y2 - 6 December 2021 through 14 December 2021
ER -