Gluon-async: A bulk-asynchronous system for distributed and heterogeneous graph analytics

Roshan Dathathri, Gurbinder Gill, Loc Hoang, Vishwesh Jatala, Keshav Pingali, V. Krishna Nandivada, Hoang Vu Dang, Marc Snir

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Distributed graph analytics systems for CPUs, like D-Galois and Gemini, and for GPUs, like D-IrGL and Lux, use a bulk-synchronous parallel (BSP) programming and execution model. BSP permits bulk-communication and uses large messages which are supported efficiently by current message transport layers, but bulk-synchronization can exacerbate the performance impact of load imbalance because a round cannot be completed until every host has completed that round. Asynchronous distributed graph analytics systems circumvent this problem by permitting hosts to make progress at their own pace, but existing systems either use global locks and send small messages or send large messages but do not support general partitioning policies such as vertex-cuts. Consequently, they perform substantially worse than bulk-synchronous systems. Moreover, none of their programming or execution models can be easily adapted for heterogeneous devices like GPUs. In this paper, we design and implement a lock-free, non-blocking, bulk-Asynchronous runtime called Gluon-Async for distributed and heterogeneous graph analytics. The runtime supports any partitioning policy and uses bulk-communication. We present the bulk-Asynchronous parallel (BASP) model which allows the programmer to utilize the runtime by specifying only the abstract communication required. Applications written in this model are compared with the BSP programs written using (1) D-Galois and D-IrGL, the state-of-The-Art distributed graph analytics systems (which are bulk-synchronous) for CPUs and GPUs, respectively, and (2) Lux, another (bulk-synchronous) distributed GPU graph analytical system. Our evaluation shows that programs written using BASP-style execution are on average ~1.5x faster than those in D-Galois and D-IrGL on real-world large-diameter graphs at scale. They are also on average ~12x faster than Lux. To the best of our knowledge, Gluon-Async is the first asynchronous distributed GPU graph analytics system.

Original languageEnglish (US)
Title of host publicationProceedings - 2019 28th International Conference on Parallel Architectures and Compilation Techniques, PACT 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages15-28
Number of pages14
ISBN (Electronic)9781728136134
DOIs
StatePublished - Sep 2019
Externally publishedYes
Event28th International Conference on Parallel Architectures and Compilation Techniques, PACT 2019 - Seattle, United States
Duration: Sep 21 2019Sep 25 2019

Publication series

NameParallel Architectures and Compilation Techniques - Conference Proceedings, PACT
Volume2019-September
ISSN (Print)1089-795X

Conference

Conference28th International Conference on Parallel Architectures and Compilation Techniques, PACT 2019
Country/TerritoryUnited States
CitySeattle
Period9/21/199/25/19

Keywords

  • BSP model
  • asynchronous parallel execution models
  • distributed and heterogeneous
  • graph analytics

ASJC Scopus subject areas

  • Software
  • Theoretical Computer Science
  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'Gluon-async: A bulk-asynchronous system for distributed and heterogeneous graph analytics'. Together they form a unique fingerprint.

Cite this