Abstract
Maintaining memory access locality is continuing to be a challenge for parallel applications and their runtime environments. By exploiting locality, application performance, resource usage, and performance portability can be improved. The main challenge is to detect and fix memory locality issues for applications that use shared-memory programming models for intra-node parallelization. In this paper, we investigate improving memory access locality of a hybrid MPI+OpenMP application in two different ways, by manually fixing locality issues in its source code and by employing the Adaptive MPI (AMPI) runtime environment. Results show that AMPI can result in similar locality improvements as manual source code changes, leading to substantial performance and scalability gains compared to the unoptimized version and to a pure MPI runtime. Compared to the hybrid MPI+OpenMP baseline, our optimizations improved performance by 1.8x on a single cluster node, and by 1.4x on 32 nodes, with a speedup of 2.4x compared to a pure MPI execution on 32 nodes. In addition to performance, we also evaluate the impact of memory locality on the load balance within a node.
Original language | English (US) |
---|---|
Title of host publication | EuroMPI 2017 - Proceedings of the 24th European MPI Users� Group Meeting |
Publisher | Association for Computing Machinery |
ISBN (Print) | 9781450348492 |
DOIs | |
State | Published - Sep 25 2017 |
Event | 24th European MPI Users� Group Meeting, EuroMPI 2017 - Chicago, United States Duration: Sep 25 2017 → Sep 28 2017 |
Publication series
Name | ACM International Conference Proceeding Series |
---|
Other
Other | 24th European MPI Users� Group Meeting, EuroMPI 2017 |
---|---|
Country | United States |
City | Chicago |
Period | 9/25/17 → 9/28/17 |
Fingerprint
Keywords
- AMPI
- Hybrid applications
- Load balancing
- MPI
- Memory access locality
- OpenMP
ASJC Scopus subject areas
- Software
- Human-Computer Interaction
- Computer Vision and Pattern Recognition
- Computer Networks and Communications
Cite this
Improving the memory access locality of hybrid MPI applications. / Diener, Matthias; White, Sam; Kale, Laxmikant V; Campbell, Michael; Bodony, Daniel J; Freund, Jonathan.
EuroMPI 2017 - Proceedings of the 24th European MPI Users� Group Meeting. Association for Computing Machinery, 2017. a11 (ACM International Conference Proceeding Series).Research output: Chapter in Book/Report/Conference proceeding › Conference contribution
}
TY - GEN
T1 - Improving the memory access locality of hybrid MPI applications
AU - Diener, Matthias
AU - White, Sam
AU - Kale, Laxmikant V
AU - Campbell, Michael
AU - Bodony, Daniel J
AU - Freund, Jonathan
PY - 2017/9/25
Y1 - 2017/9/25
N2 - Maintaining memory access locality is continuing to be a challenge for parallel applications and their runtime environments. By exploiting locality, application performance, resource usage, and performance portability can be improved. The main challenge is to detect and fix memory locality issues for applications that use shared-memory programming models for intra-node parallelization. In this paper, we investigate improving memory access locality of a hybrid MPI+OpenMP application in two different ways, by manually fixing locality issues in its source code and by employing the Adaptive MPI (AMPI) runtime environment. Results show that AMPI can result in similar locality improvements as manual source code changes, leading to substantial performance and scalability gains compared to the unoptimized version and to a pure MPI runtime. Compared to the hybrid MPI+OpenMP baseline, our optimizations improved performance by 1.8x on a single cluster node, and by 1.4x on 32 nodes, with a speedup of 2.4x compared to a pure MPI execution on 32 nodes. In addition to performance, we also evaluate the impact of memory locality on the load balance within a node.
AB - Maintaining memory access locality is continuing to be a challenge for parallel applications and their runtime environments. By exploiting locality, application performance, resource usage, and performance portability can be improved. The main challenge is to detect and fix memory locality issues for applications that use shared-memory programming models for intra-node parallelization. In this paper, we investigate improving memory access locality of a hybrid MPI+OpenMP application in two different ways, by manually fixing locality issues in its source code and by employing the Adaptive MPI (AMPI) runtime environment. Results show that AMPI can result in similar locality improvements as manual source code changes, leading to substantial performance and scalability gains compared to the unoptimized version and to a pure MPI runtime. Compared to the hybrid MPI+OpenMP baseline, our optimizations improved performance by 1.8x on a single cluster node, and by 1.4x on 32 nodes, with a speedup of 2.4x compared to a pure MPI execution on 32 nodes. In addition to performance, we also evaluate the impact of memory locality on the load balance within a node.
KW - AMPI
KW - Hybrid applications
KW - Load balancing
KW - MPI
KW - Memory access locality
KW - OpenMP
UR - http://www.scopus.com/inward/record.url?scp=85054249616&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85054249616&partnerID=8YFLogxK
U2 - 10.1145/3127024.3127038
DO - 10.1145/3127024.3127038
M3 - Conference contribution
AN - SCOPUS:85054249616
SN - 9781450348492
T3 - ACM International Conference Proceeding Series
BT - EuroMPI 2017 - Proceedings of the 24th European MPI Users� Group Meeting
PB - Association for Computing Machinery
ER -