TY - JOUR
T1 - Efficient Learning of Linear Graph Neural Networks via Node Subsampling
AU - Shin, Seiyun
AU - Shomorony, Ilan
AU - Zhao, Han
N1 - We thank Ruosong Wang, Yun Yang for the helpful discussion and the anonymous reviewers for their constructive feedback. Han Zhao and Seiyun Shin were partly supported by the Defense Advanced Research Projects Agency (DARPA) under Cooperative Agreement Number: HR00112320012, an IBM-IL Discovery Accelerator Institute research award, and Amazon AWS Cloud Credit. The work of Ilan Shomorony was supported in part by the National Science Foundation (NSF) under grant CCF-2046991.
PY - 2023
Y1 - 2023
N2 - Graph Neural Networks (GNNs) are a powerful class of machine learning models with applications in recommender systems, drug discovery, social network analysis, and computer vision. One challenge with their implementation is that GNNs often take large-scale graphs as inputs, which imposes significant computational/storage costs in the training and testing phases. In particular, the message passing operations of a GNN require multiplication of the graph adjacency matrix A ∈ Rn×n and the data matrix X ∈ Rn×d, and the O(n2d) time complexity can be prohibitive for large n. Thus, a natural question is whether it is possible to perform the GNN operations in (quasi-)linear time by avoiding the full computation of AX. To study this question, we consider the setting of a regression task on a two-layer Linear Graph Convolutional Network (GCN). We develop an efficient training algorithm based on (1) performing node subsampling, (2) estimating the leverage scores of AX based on the subsampled graph, and (3) performing leverage score sampling on AX. We show that our proposed scheme learns the regression model observing only O(ndε−2 log n) entries of A in time O(nd2ε−2 log n), with the guarantee that the learned weights deviate by at most ε under the ℓ2 norm from the model learned using the entire adjacency matrix A. We present empirical results for regression problems on real-world graphs and show that our algorithm significantly outperforms other baseline sampling strategies that exploit the same number of observations.
AB - Graph Neural Networks (GNNs) are a powerful class of machine learning models with applications in recommender systems, drug discovery, social network analysis, and computer vision. One challenge with their implementation is that GNNs often take large-scale graphs as inputs, which imposes significant computational/storage costs in the training and testing phases. In particular, the message passing operations of a GNN require multiplication of the graph adjacency matrix A ∈ Rn×n and the data matrix X ∈ Rn×d, and the O(n2d) time complexity can be prohibitive for large n. Thus, a natural question is whether it is possible to perform the GNN operations in (quasi-)linear time by avoiding the full computation of AX. To study this question, we consider the setting of a regression task on a two-layer Linear Graph Convolutional Network (GCN). We develop an efficient training algorithm based on (1) performing node subsampling, (2) estimating the leverage scores of AX based on the subsampled graph, and (3) performing leverage score sampling on AX. We show that our proposed scheme learns the regression model observing only O(ndε−2 log n) entries of A in time O(nd2ε−2 log n), with the guarantee that the learned weights deviate by at most ε under the ℓ2 norm from the model learned using the entire adjacency matrix A. We present empirical results for regression problems on real-world graphs and show that our algorithm significantly outperforms other baseline sampling strategies that exploit the same number of observations.
UR - http://www.scopus.com/inward/record.url?scp=85205445413&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85205445413&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85205445413
SN - 1049-5258
VL - 36
JO - Advances in Neural Information Processing Systems
JF - Advances in Neural Information Processing Systems
T2 - 37th Conference on Neural Information Processing Systems, NeurIPS 2023
Y2 - 10 December 2023 through 16 December 2023
ER -