Improving the memory access locality of hybrid MPI applications

Matthias Diener, Sam White, Laxmikant V Kale, Michael Campbell, Daniel J Bodony, Jonathan Freund

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Maintaining memory access locality is continuing to be a challenge for parallel applications and their runtime environments. By exploiting locality, application performance, resource usage, and performance portability can be improved. The main challenge is to detect and fix memory locality issues for applications that use shared-memory programming models for intra-node parallelization. In this paper, we investigate improving memory access locality of a hybrid MPI+OpenMP application in two different ways, by manually fixing locality issues in its source code and by employing the Adaptive MPI (AMPI) runtime environment. Results show that AMPI can result in similar locality improvements as manual source code changes, leading to substantial performance and scalability gains compared to the unoptimized version and to a pure MPI runtime. Compared to the hybrid MPI+OpenMP baseline, our optimizations improved performance by 1.8x on a single cluster node, and by 1.4x on 32 nodes, with a speedup of 2.4x compared to a pure MPI execution on 32 nodes. In addition to performance, we also evaluate the impact of memory locality on the load balance within a node.

Original languageEnglish (US)
Title of host publicationEuroMPI 2017 - Proceedings of the 24th European MPI Users� Group Meeting
PublisherAssociation for Computing Machinery
ISBN (Print)9781450348492
DOIs
StatePublished - Sep 25 2017
Event24th European MPI Users� Group Meeting, EuroMPI 2017 - Chicago, United States
Duration: Sep 25 2017Sep 28 2017

Publication series

NameACM International Conference Proceeding Series

Other

Other24th European MPI Users� Group Meeting, EuroMPI 2017
CountryUnited States
CityChicago
Period9/25/179/28/17

Fingerprint

Data storage equipment
Computer programming
Scalability

Keywords

  • AMPI
  • Hybrid applications
  • Load balancing
  • MPI
  • Memory access locality
  • OpenMP

ASJC Scopus subject areas

  • Software
  • Human-Computer Interaction
  • Computer Vision and Pattern Recognition
  • Computer Networks and Communications

Cite this

Diener, M., White, S., Kale, L. V., Campbell, M., Bodony, D. J., & Freund, J. (2017). Improving the memory access locality of hybrid MPI applications. In EuroMPI 2017 - Proceedings of the 24th European MPI Users� Group Meeting [a11] (ACM International Conference Proceeding Series). Association for Computing Machinery. https://doi.org/10.1145/3127024.3127038

Improving the memory access locality of hybrid MPI applications. / Diener, Matthias; White, Sam; Kale, Laxmikant V; Campbell, Michael; Bodony, Daniel J; Freund, Jonathan.

EuroMPI 2017 - Proceedings of the 24th European MPI Users� Group Meeting. Association for Computing Machinery, 2017. a11 (ACM International Conference Proceeding Series).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Diener, M, White, S, Kale, LV, Campbell, M, Bodony, DJ & Freund, J 2017, Improving the memory access locality of hybrid MPI applications. in EuroMPI 2017 - Proceedings of the 24th European MPI Users� Group Meeting., a11, ACM International Conference Proceeding Series, Association for Computing Machinery, 24th European MPI Users� Group Meeting, EuroMPI 2017, Chicago, United States, 9/25/17. https://doi.org/10.1145/3127024.3127038
Diener M, White S, Kale LV, Campbell M, Bodony DJ, Freund J. Improving the memory access locality of hybrid MPI applications. In EuroMPI 2017 - Proceedings of the 24th European MPI Users� Group Meeting. Association for Computing Machinery. 2017. a11. (ACM International Conference Proceeding Series). https://doi.org/10.1145/3127024.3127038
Diener, Matthias ; White, Sam ; Kale, Laxmikant V ; Campbell, Michael ; Bodony, Daniel J ; Freund, Jonathan. / Improving the memory access locality of hybrid MPI applications. EuroMPI 2017 - Proceedings of the 24th European MPI Users� Group Meeting. Association for Computing Machinery, 2017. (ACM International Conference Proceeding Series).
@inproceedings{290d32371ed64a19a646e7d3ade9da98,
title = "Improving the memory access locality of hybrid MPI applications",
abstract = "Maintaining memory access locality is continuing to be a challenge for parallel applications and their runtime environments. By exploiting locality, application performance, resource usage, and performance portability can be improved. The main challenge is to detect and fix memory locality issues for applications that use shared-memory programming models for intra-node parallelization. In this paper, we investigate improving memory access locality of a hybrid MPI+OpenMP application in two different ways, by manually fixing locality issues in its source code and by employing the Adaptive MPI (AMPI) runtime environment. Results show that AMPI can result in similar locality improvements as manual source code changes, leading to substantial performance and scalability gains compared to the unoptimized version and to a pure MPI runtime. Compared to the hybrid MPI+OpenMP baseline, our optimizations improved performance by 1.8x on a single cluster node, and by 1.4x on 32 nodes, with a speedup of 2.4x compared to a pure MPI execution on 32 nodes. In addition to performance, we also evaluate the impact of memory locality on the load balance within a node.",
keywords = "AMPI, Hybrid applications, Load balancing, MPI, Memory access locality, OpenMP",
author = "Matthias Diener and Sam White and Kale, {Laxmikant V} and Michael Campbell and Bodony, {Daniel J} and Jonathan Freund",
year = "2017",
month = "9",
day = "25",
doi = "10.1145/3127024.3127038",
language = "English (US)",
isbn = "9781450348492",
series = "ACM International Conference Proceeding Series",
publisher = "Association for Computing Machinery",
booktitle = "EuroMPI 2017 - Proceedings of the 24th European MPI Users� Group Meeting",

}

TY - GEN

T1 - Improving the memory access locality of hybrid MPI applications

AU - Diener, Matthias

AU - White, Sam

AU - Kale, Laxmikant V

AU - Campbell, Michael

AU - Bodony, Daniel J

AU - Freund, Jonathan

PY - 2017/9/25

Y1 - 2017/9/25

N2 - Maintaining memory access locality is continuing to be a challenge for parallel applications and their runtime environments. By exploiting locality, application performance, resource usage, and performance portability can be improved. The main challenge is to detect and fix memory locality issues for applications that use shared-memory programming models for intra-node parallelization. In this paper, we investigate improving memory access locality of a hybrid MPI+OpenMP application in two different ways, by manually fixing locality issues in its source code and by employing the Adaptive MPI (AMPI) runtime environment. Results show that AMPI can result in similar locality improvements as manual source code changes, leading to substantial performance and scalability gains compared to the unoptimized version and to a pure MPI runtime. Compared to the hybrid MPI+OpenMP baseline, our optimizations improved performance by 1.8x on a single cluster node, and by 1.4x on 32 nodes, with a speedup of 2.4x compared to a pure MPI execution on 32 nodes. In addition to performance, we also evaluate the impact of memory locality on the load balance within a node.

AB - Maintaining memory access locality is continuing to be a challenge for parallel applications and their runtime environments. By exploiting locality, application performance, resource usage, and performance portability can be improved. The main challenge is to detect and fix memory locality issues for applications that use shared-memory programming models for intra-node parallelization. In this paper, we investigate improving memory access locality of a hybrid MPI+OpenMP application in two different ways, by manually fixing locality issues in its source code and by employing the Adaptive MPI (AMPI) runtime environment. Results show that AMPI can result in similar locality improvements as manual source code changes, leading to substantial performance and scalability gains compared to the unoptimized version and to a pure MPI runtime. Compared to the hybrid MPI+OpenMP baseline, our optimizations improved performance by 1.8x on a single cluster node, and by 1.4x on 32 nodes, with a speedup of 2.4x compared to a pure MPI execution on 32 nodes. In addition to performance, we also evaluate the impact of memory locality on the load balance within a node.

KW - AMPI

KW - Hybrid applications

KW - Load balancing

KW - MPI

KW - Memory access locality

KW - OpenMP

UR - http://www.scopus.com/inward/record.url?scp=85054249616&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85054249616&partnerID=8YFLogxK

U2 - 10.1145/3127024.3127038

DO - 10.1145/3127024.3127038

M3 - Conference contribution

AN - SCOPUS:85054249616

SN - 9781450348492

T3 - ACM International Conference Proceeding Series

BT - EuroMPI 2017 - Proceedings of the 24th European MPI Users� Group Meeting

PB - Association for Computing Machinery

ER -