Collective error detection for MPI collective operations

Chris Falzone, Anthony Chan, Ewing Lusk, William Gropp

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

An MPI profiling library is a standard mechanism for intercepting MPI calls by applications. Profiling libraries are so named because they are commonly used to gather performance data on MPI programs. Here we present a profiling library whose purpose is to detect user errors in the use of MPI's collective operations. While some errors can be detected locally (by a single process), other errors involving the consistency of arguments passed to MPI collective functions must be tested for in a collective fashion. While the idea of using such a profiling library does not originate here, we take the idea further than it has been taken before (we detect more errors) and offer an open-source library that can be used with any MPI implementation. We describe the tests carried out, provide some details of the implementation, illustrate the usage of the library, and present performance tests.

Original languageEnglish (US)
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Pages138-147
Number of pages10
DOIs
StatePublished - 2005
Externally publishedYes
Event12th European PVM/MPI Users' Group Meeting - Recent Advances in Parallel Virtual Machine and Message Passing Interface - Sorrento, Italy
Duration: Sep 18 2005Sep 21 2005

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume3666 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other12th European PVM/MPI Users' Group Meeting - Recent Advances in Parallel Virtual Machine and Message Passing Interface
Country/TerritoryItaly
CitySorrento
Period9/18/059/21/05

Keywords

  • Collective
  • Datatype
  • Errors
  • Hashing
  • MPI

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Collective error detection for MPI collective operations'. Together they form a unique fingerprint.

Cite this