Pilgrim: Scalable and (near) Lossless MPI Tracing

Chen Wang, Pavan Balaji, Marc Snir

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Traces of MPI communications are used by many performance analysis and visualization tools. Storing exhaustive traces of large scale MPI applications is infeasible, due to their large volume. Aggregated or lossy MPI traces are smaller, but provide much less information. In this paper, we present Pilgrim, a near lossless MPI tracing tool that incurs moderate overheads and generates small trace files at large scales, by using sophisticated compression techniques. Furthermore, for codes with regular communication patterns, Pilgrim can store their traces in constant space regardless of the problem size, the number of processors, and the number of iterations. In comparison with existing tools, Pilgrim preserves more information with less space in all the programs we tested.

Original languageEnglish (US)
Title of host publicationProceedings of SC 2021
Subtitle of host publicationThe International Conference for High Performance Computing, Networking, Storage and Analysis: Science and Beyond
PublisherIEEE Computer Society
ISBN (Electronic)9781450384421
DOIs
StatePublished - Nov 14 2021
Event33rd International Conference for High Performance Computing, Networking, Storage and Analysis: Science and Beyond, SC 2021 - Virtual, Online, United States
Duration: Nov 14 2021Nov 19 2021

Publication series

NameInternational Conference for High Performance Computing, Networking, Storage and Analysis, SC
ISSN (Print)2167-4329
ISSN (Electronic)2167-4337

Conference

Conference33rd International Conference for High Performance Computing, Networking, Storage and Analysis: Science and Beyond, SC 2021
Country/TerritoryUnited States
CityVirtual, Online
Period11/14/2111/19/21

Keywords

  • Communication tracing
  • Lossless MPI tracing
  • Trace compression

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Computer Science Applications
  • Hardware and Architecture
  • Software

Fingerprint

Dive into the research topics of 'Pilgrim: Scalable and (near) Lossless MPI Tracing'. Together they form a unique fingerprint.

Cite this