TY - GEN
T1 - Optimization strategies for MPI-interoperable active messages
AU - Zhao, Xin
AU - Balaji, Pavan
AU - Gropp, William D
AU - Thakur, Rajeev
PY - 2013
Y1 - 2013
N2 - Data-intensive applications, such as those in bioinformatics and social network analysis, differ from traditional scientific applications in that they often involve data-driven and irregular computation/communication patterns, making them ill-suited for traditional data movement approaches. Active Messages (AM) is an alternative programming model that allows dynamically moving computation closer to data, rather than moving the data to the local process. In our previous work, we proposed an MPI-interoperable AM framework that allows existing MPI applications to incrementally take advantage of AM capabilities. While that work presented a baseline implementation of how AMs semantically interact with the rest of the MPI infrastructure, it had several performance shortcomings. In this paper, we analyze these performance shortcomings and propose three optimization strategies: one implicitly derived by the MPI implementation and two explicitly hinted to by the application user. In addition to the detailed description of these optimization strategies, the paper presents a thorough performance evaluation on a 4096-core cluster that demonstrates considerable performance advantages from these strategies.
AB - Data-intensive applications, such as those in bioinformatics and social network analysis, differ from traditional scientific applications in that they often involve data-driven and irregular computation/communication patterns, making them ill-suited for traditional data movement approaches. Active Messages (AM) is an alternative programming model that allows dynamically moving computation closer to data, rather than moving the data to the local process. In our previous work, we proposed an MPI-interoperable AM framework that allows existing MPI applications to incrementally take advantage of AM capabilities. While that work presented a baseline implementation of how AMs semantically interact with the rest of the MPI infrastructure, it had several performance shortcomings. In this paper, we analyze these performance shortcomings and propose three optimization strategies: one implicitly derived by the MPI implementation and two explicitly hinted to by the application user. In addition to the detailed description of these optimization strategies, the paper presents a thorough performance evaluation on a 4096-core cluster that demonstrates considerable performance advantages from these strategies.
KW - Active messages
KW - Data-intensive applications
KW - MPI
KW - Multicore
KW - RMA
UR - http://www.scopus.com/inward/record.url?scp=84904498093&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84904498093&partnerID=8YFLogxK
U2 - 10.1109/DASC.2013.116
DO - 10.1109/DASC.2013.116
M3 - Conference contribution
AN - SCOPUS:84904498093
SN - 9781479933815
T3 - Proceedings - 2013 IEEE 11th International Conference on Dependable, Autonomic and Secure Computing, DASC 2013
SP - 508
EP - 515
BT - Proceedings - 2013 IEEE 11th International Conference on Dependable, Autonomic and Secure Computing, DASC 2013
PB - IEEE Computer Society
T2 - 11th IEEE International Conference on Dependable, Autonomic and Secure Computing, DASC 2013
Y2 - 21 December 2013 through 22 December 2013
ER -