Betty: Enabling Large-Scale GNN Training with Batch-Level Graph Partitioning

Shuangyan Yang, Minjia Zhang, Wenqian Dong, Dong Li

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The Graph Neural Network (GNN) is showing outstanding results in improving the performance of graph-based applications. Recent studies demonstrate that GNN performance can be boosted via using more advanced aggregators, deeper aggregation depth, larger sampling rate, etc. While leading to promising results, the improvements come at a cost of significantly increased memory footprint, easily exceeding GPU memory capacity. In this paper, we introduce a method, Betty, to make GNN training more scalable and accessible via batch-level partitioning. Different from DNN training, a mini-batch in GNN has complex dependencies between input features and output labels, making batch-level partitioning difficult. Betty introduces two noveltechniques, redundancy-embedded graph (REG) partitioning and memory-aware partitioning, to effectively mitigate the redundancy and load imbalances issues across the partitions. Our evaluation of large-scale real-world datasets shows that Betty can significantly mitigate the memory bottleneck, enabling scalable GNN training with much deeper aggregation depths, larger sampling rate, larger training batch sizes, together with more advanced aggregators, with a few as a single GPU.

Original languageEnglish (US)
Title of host publicationASPLOS 2023 - Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems
EditorsTor M. Aamodt, Natalie Enright Jerger, Michael Swift
PublisherAssociation for Computing Machinery
Pages103-117
Number of pages15
ISBN (Electronic)9781450399166
DOIs
StatePublished - Jan 27 2023
Externally publishedYes
Event28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2023 - Vancouver, Canada
Duration: Mar 25 2023Mar 29 2023

Publication series

NameInternational Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS
Volume2

Conference

Conference28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2023
Country/TerritoryCanada
CityVancouver
Period3/25/233/29/23

Keywords

  • Efficient training method
  • Graph neural network
  • Graph partition
  • Heterogeneous memory
  • Load balancing
  • Redundancy elimination

ASJC Scopus subject areas

  • Software
  • Information Systems
  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'Betty: Enabling Large-Scale GNN Training with Batch-Level Graph Partitioning'. Together they form a unique fingerprint.

Cite this