Improving the Scaling of an Asynchronous Many-Task Runtime with a Lightweight Communication Engine

Omri Mor, George Bosilca, Marc Snir

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

There is a growing interest in Asynchronous Many-Task (AMT) runtimes as an efficient way to map irregular and dynamic parallel applications onto heterogeneous computing resources. In this work, we show that AMTs nonetheless struggle with communication bottlenecks when scaling computations strongly and that the design of commonly-used communication libraries such as MPI contribute to these bottlenecks. We replace MPI with LCI, a Lightweight Communication Interface that is designed for dynamic, asynchronous frameworks, as the communication layer for the PaRSEC runtime. The result is a significant reduction of end-to-end latency in communication microbenchmarks and a reduction of overall time-to-solution by up to 12% in HiCMA, a tile-based low-rank Cholesky factorization package.

Original languageEnglish (US)
Title of host publication52nd International Conference on Parallel Processing, ICPP 2023 - Main Conference Proceedings
PublisherAssociation for Computing Machinery
Pages153-162
Number of pages10
ISBN (Electronic)9798400708435
DOIs
StatePublished - Aug 7 2023
Event52nd International Conference on Parallel Processing, ICPP 2023 - Salt Lake City, United States
Duration: Aug 7 2023Aug 10 2023

Publication series

NameACM International Conference Proceeding Series

Conference

Conference52nd International Conference on Parallel Processing, ICPP 2023
Country/TerritoryUnited States
CitySalt Lake City
Period8/7/238/10/23

Keywords

  • MPI
  • asynchronous many-task
  • dynamic runtime
  • lightweight communication
  • low-rank Cholesky
  • message-passing
  • strong scaling

ASJC Scopus subject areas

  • Human-Computer Interaction
  • Computer Networks and Communications
  • Computer Vision and Pattern Recognition
  • Software

Fingerprint

Dive into the research topics of 'Improving the Scaling of an Asynchronous Many-Task Runtime with a Lightweight Communication Engine'. Together they form a unique fingerprint.

Cite this