Nonuniformly communicating noncontiguous data: A case study with PETSc and MPI

P. Balaji, D. Buntinas, S. Balay, B. Smith, R. Thakur, W. Gropp

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Due to the complexity associated with developing parallel applications, scientists and engineers rely on high-level software libraries such as PETSc, ScaLAPACK and PESSL to ease this task. Such libraries assist developers by providing abstractions for mathematical operations, data representation and management of parallel layouts of the data, while internally using communication libraries such as MPI and PVM. With high-level libraries managing data layout and communication internally, it can be expected that they organize application data suitably for performing the library operations optimally. However, this places additional overhead on the underlying communication library by making the data layout noncontiguous in memory and communication volumes (data transferred by a process to each of the other processes) nonuniform. In this paper, we analyze the overheads associated with these two aspects (noncontiguous data layouts and nonuniform communication volumes) in the context of the PETSc software toolkit over the MPI communication library. We describe the issues with the current approaches used by MPICH2 (an implementation of MPI), propose different approaches to handle these issues and evaluate these approaches with micro-benchmarks as well as an application over the PETSc software library. Our experimental results demonstrate close to an order of magnitude improvement in the performance of a 3-D Laplacian multi-grid solver application when evaluated on a 128 processor cluster.

Original languageEnglish (US)
Title of host publicationProceedings - 21st International Parallel and Distributed Processing Symposium, IPDPS 2007; Abstracts and CD-ROM
DOIs
StatePublished - Sep 24 2007
Event21st International Parallel and Distributed Processing Symposium, IPDPS 2007 - Long Beach, CA, United States
Duration: Mar 26 2007Mar 30 2007

Publication series

NameProceedings - 21st International Parallel and Distributed Processing Symposium, IPDPS 2007; Abstracts and CD-ROM

Other

Other21st International Parallel and Distributed Processing Symposium, IPDPS 2007
CountryUnited States
CityLong Beach, CA
Period3/26/073/30/07

Fingerprint

Layout
Communication
Software
Libraries
Parallel Applications
3D
Benchmark
Data storage equipment
Engineers
Evaluate
Experimental Results
Demonstrate

ASJC Scopus subject areas

  • Hardware and Architecture
  • Software
  • Mathematics(all)

Cite this

Balaji, P., Buntinas, D., Balay, S., Smith, B., Thakur, R., & Gropp, W. (2007). Nonuniformly communicating noncontiguous data: A case study with PETSc and MPI. In Proceedings - 21st International Parallel and Distributed Processing Symposium, IPDPS 2007; Abstracts and CD-ROM [4227951] (Proceedings - 21st International Parallel and Distributed Processing Symposium, IPDPS 2007; Abstracts and CD-ROM). https://doi.org/10.1109/IPDPS.2007.370223

Nonuniformly communicating noncontiguous data : A case study with PETSc and MPI. / Balaji, P.; Buntinas, D.; Balay, S.; Smith, B.; Thakur, R.; Gropp, W.

Proceedings - 21st International Parallel and Distributed Processing Symposium, IPDPS 2007; Abstracts and CD-ROM. 2007. 4227951 (Proceedings - 21st International Parallel and Distributed Processing Symposium, IPDPS 2007; Abstracts and CD-ROM).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Balaji, P, Buntinas, D, Balay, S, Smith, B, Thakur, R & Gropp, W 2007, Nonuniformly communicating noncontiguous data: A case study with PETSc and MPI. in Proceedings - 21st International Parallel and Distributed Processing Symposium, IPDPS 2007; Abstracts and CD-ROM., 4227951, Proceedings - 21st International Parallel and Distributed Processing Symposium, IPDPS 2007; Abstracts and CD-ROM, 21st International Parallel and Distributed Processing Symposium, IPDPS 2007, Long Beach, CA, United States, 3/26/07. https://doi.org/10.1109/IPDPS.2007.370223
Balaji P, Buntinas D, Balay S, Smith B, Thakur R, Gropp W. Nonuniformly communicating noncontiguous data: A case study with PETSc and MPI. In Proceedings - 21st International Parallel and Distributed Processing Symposium, IPDPS 2007; Abstracts and CD-ROM. 2007. 4227951. (Proceedings - 21st International Parallel and Distributed Processing Symposium, IPDPS 2007; Abstracts and CD-ROM). https://doi.org/10.1109/IPDPS.2007.370223
Balaji, P. ; Buntinas, D. ; Balay, S. ; Smith, B. ; Thakur, R. ; Gropp, W. / Nonuniformly communicating noncontiguous data : A case study with PETSc and MPI. Proceedings - 21st International Parallel and Distributed Processing Symposium, IPDPS 2007; Abstracts and CD-ROM. 2007. (Proceedings - 21st International Parallel and Distributed Processing Symposium, IPDPS 2007; Abstracts and CD-ROM).
@inproceedings{c4b9b0a756f54b39bb60a84922216cb1,
title = "Nonuniformly communicating noncontiguous data: A case study with PETSc and MPI",
abstract = "Due to the complexity associated with developing parallel applications, scientists and engineers rely on high-level software libraries such as PETSc, ScaLAPACK and PESSL to ease this task. Such libraries assist developers by providing abstractions for mathematical operations, data representation and management of parallel layouts of the data, while internally using communication libraries such as MPI and PVM. With high-level libraries managing data layout and communication internally, it can be expected that they organize application data suitably for performing the library operations optimally. However, this places additional overhead on the underlying communication library by making the data layout noncontiguous in memory and communication volumes (data transferred by a process to each of the other processes) nonuniform. In this paper, we analyze the overheads associated with these two aspects (noncontiguous data layouts and nonuniform communication volumes) in the context of the PETSc software toolkit over the MPI communication library. We describe the issues with the current approaches used by MPICH2 (an implementation of MPI), propose different approaches to handle these issues and evaluate these approaches with micro-benchmarks as well as an application over the PETSc software library. Our experimental results demonstrate close to an order of magnitude improvement in the performance of a 3-D Laplacian multi-grid solver application when evaluated on a 128 processor cluster.",
author = "P. Balaji and D. Buntinas and S. Balay and B. Smith and R. Thakur and W. Gropp",
year = "2007",
month = "9",
day = "24",
doi = "10.1109/IPDPS.2007.370223",
language = "English (US)",
isbn = "1424409101",
series = "Proceedings - 21st International Parallel and Distributed Processing Symposium, IPDPS 2007; Abstracts and CD-ROM",
booktitle = "Proceedings - 21st International Parallel and Distributed Processing Symposium, IPDPS 2007; Abstracts and CD-ROM",

}

TY - GEN

T1 - Nonuniformly communicating noncontiguous data

T2 - A case study with PETSc and MPI

AU - Balaji, P.

AU - Buntinas, D.

AU - Balay, S.

AU - Smith, B.

AU - Thakur, R.

AU - Gropp, W.

PY - 2007/9/24

Y1 - 2007/9/24

N2 - Due to the complexity associated with developing parallel applications, scientists and engineers rely on high-level software libraries such as PETSc, ScaLAPACK and PESSL to ease this task. Such libraries assist developers by providing abstractions for mathematical operations, data representation and management of parallel layouts of the data, while internally using communication libraries such as MPI and PVM. With high-level libraries managing data layout and communication internally, it can be expected that they organize application data suitably for performing the library operations optimally. However, this places additional overhead on the underlying communication library by making the data layout noncontiguous in memory and communication volumes (data transferred by a process to each of the other processes) nonuniform. In this paper, we analyze the overheads associated with these two aspects (noncontiguous data layouts and nonuniform communication volumes) in the context of the PETSc software toolkit over the MPI communication library. We describe the issues with the current approaches used by MPICH2 (an implementation of MPI), propose different approaches to handle these issues and evaluate these approaches with micro-benchmarks as well as an application over the PETSc software library. Our experimental results demonstrate close to an order of magnitude improvement in the performance of a 3-D Laplacian multi-grid solver application when evaluated on a 128 processor cluster.

AB - Due to the complexity associated with developing parallel applications, scientists and engineers rely on high-level software libraries such as PETSc, ScaLAPACK and PESSL to ease this task. Such libraries assist developers by providing abstractions for mathematical operations, data representation and management of parallel layouts of the data, while internally using communication libraries such as MPI and PVM. With high-level libraries managing data layout and communication internally, it can be expected that they organize application data suitably for performing the library operations optimally. However, this places additional overhead on the underlying communication library by making the data layout noncontiguous in memory and communication volumes (data transferred by a process to each of the other processes) nonuniform. In this paper, we analyze the overheads associated with these two aspects (noncontiguous data layouts and nonuniform communication volumes) in the context of the PETSc software toolkit over the MPI communication library. We describe the issues with the current approaches used by MPICH2 (an implementation of MPI), propose different approaches to handle these issues and evaluate these approaches with micro-benchmarks as well as an application over the PETSc software library. Our experimental results demonstrate close to an order of magnitude improvement in the performance of a 3-D Laplacian multi-grid solver application when evaluated on a 128 processor cluster.

UR - http://www.scopus.com/inward/record.url?scp=34548784885&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34548784885&partnerID=8YFLogxK

U2 - 10.1109/IPDPS.2007.370223

DO - 10.1109/IPDPS.2007.370223

M3 - Conference contribution

AN - SCOPUS:34548784885

SN - 1424409101

SN - 9781424409105

T3 - Proceedings - 21st International Parallel and Distributed Processing Symposium, IPDPS 2007; Abstracts and CD-ROM

BT - Proceedings - 21st International Parallel and Distributed Processing Symposium, IPDPS 2007; Abstracts and CD-ROM

ER -