TY - GEN
T1 - MultiSage
T2 - 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2020
AU - Yang, Carl
AU - Pal, Aditya
AU - Zhai, Andrew
AU - Pancha, Nikil
AU - Han, Jiawei
AU - Rosenberg, Charles
AU - Leskovec, Jure
N1 - Publisher Copyright:
© 2020 Owner/Author.
PY - 2020/8/23
Y1 - 2020/8/23
N2 - Graph convolutional networks (GCNs) are a powerful class of graph neural networks. Trained in a semi-supervised end-to-end fashion, GCNs can learn to integrate node features and graph structures to generate high-quality embeddings that can be used for various downstream tasks like search and recommendation. However, existing GCNs mostly work on homogeneous graphs and consider a single embedding for each node, which do not sufficiently model the multi-facet nature and complex interaction of nodes in real-world networks. Here, we present a contextualized GCN engine by modeling the multipartite networks of target nodes and their intermediatecontext nodes that specify the contexts of their interactions. Towards the neighborhood aggregation process, we devise a contextual masking operation at the feature level and a contextual attention mechanism at the node level to achieve interaction contextualization by treating neighboring target nodes based on intermediate context nodes. Consequently, we compute multiple embeddings for target nodes that capture their diverse facets and different interactions during graph convolution, which is useful for fine-grained downstream applications. To enable efficient web-scale training, we build a parallel random walk engine to pre-sample contextualized neighbors, and a Hadoop2-based data provider pipeline to pre-join training data, dynamically reduce multi-GPU training time, and avoid high memory cost. Extensive experiments on the bipartite Pinterest graph and tripartite OAG graph corroborate the advantage of the proposed system.
AB - Graph convolutional networks (GCNs) are a powerful class of graph neural networks. Trained in a semi-supervised end-to-end fashion, GCNs can learn to integrate node features and graph structures to generate high-quality embeddings that can be used for various downstream tasks like search and recommendation. However, existing GCNs mostly work on homogeneous graphs and consider a single embedding for each node, which do not sufficiently model the multi-facet nature and complex interaction of nodes in real-world networks. Here, we present a contextualized GCN engine by modeling the multipartite networks of target nodes and their intermediatecontext nodes that specify the contexts of their interactions. Towards the neighborhood aggregation process, we devise a contextual masking operation at the feature level and a contextual attention mechanism at the node level to achieve interaction contextualization by treating neighboring target nodes based on intermediate context nodes. Consequently, we compute multiple embeddings for target nodes that capture their diverse facets and different interactions during graph convolution, which is useful for fine-grained downstream applications. To enable efficient web-scale training, we build a parallel random walk engine to pre-sample contextualized neighbors, and a Hadoop2-based data provider pipeline to pre-join training data, dynamically reduce multi-GPU training time, and avoid high memory cost. Extensive experiments on the bipartite Pinterest graph and tripartite OAG graph corroborate the advantage of the proposed system.
KW - contextualized multi-embedding
KW - graph neural network
KW - search and recommendation
KW - web-scale training and inference
UR - http://www.scopus.com/inward/record.url?scp=85090420231&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85090420231&partnerID=8YFLogxK
U2 - 10.1145/3394486.3403293
DO - 10.1145/3394486.3403293
M3 - Conference contribution
AN - SCOPUS:85090420231
T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
SP - 2434
EP - 2443
BT - KDD 2020 - Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
PB - Association for Computing Machinery
Y2 - 23 August 2020 through 27 August 2020
ER -