TY - JOUR
T1 - DAME
T2 - Runtime-compilation for data movement
AU - Prabhu, Tarun
AU - Gropp, William
N1 - The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This material is based (in part) upon work supported by the Department of Energy, National Nuclear Security Administration, under Award Number DE-NA0002374.
PY - 2018/9/1
Y1 - 2018/9/1
N2 - Modern machines consist of multiple compute devices and complex memory hierarchies. For many applications, it is imperative that any data movement between and within the various compute devices be done as efficiently as possible in order to obtain maximum performance. However, hand-optimizing code for one architecture will likely sacrifice both performance portability and software maintainability. In addition, some optimization decisions are best made at runtime. This suggests that the problem ought to be tackled on two fronts. First, provide the programmer with a declarative language to describe data layouts and data motion. This would allow the runtime system to be tuned for each architecture by a specialist and free the programmer to concentrate on the application itself. Second, exploit the execution time information to optimize the data movement code further. MPI derived datatypes accomplish the former task and Just In Time (JIT) compilation can be used for the latter. In this paper, we present DAME—a language and interpreter designed to be used as the backend for MPI derived datatypes. We also present DAME-L and DAME-X, two JIT-enabled implementations of DAME, all of which have been integrated into MPICH. We evaluate their performance on DDTBench and two mini-applications written with MPI derived datatypes and obtain communication speedups of up to 20× and mini-application speedups of up to 3×.
AB - Modern machines consist of multiple compute devices and complex memory hierarchies. For many applications, it is imperative that any data movement between and within the various compute devices be done as efficiently as possible in order to obtain maximum performance. However, hand-optimizing code for one architecture will likely sacrifice both performance portability and software maintainability. In addition, some optimization decisions are best made at runtime. This suggests that the problem ought to be tackled on two fronts. First, provide the programmer with a declarative language to describe data layouts and data motion. This would allow the runtime system to be tuned for each architecture by a specialist and free the programmer to concentrate on the application itself. Second, exploit the execution time information to optimize the data movement code further. MPI derived datatypes accomplish the former task and Just In Time (JIT) compilation can be used for the latter. In this paper, we present DAME—a language and interpreter designed to be used as the backend for MPI derived datatypes. We also present DAME-L and DAME-X, two JIT-enabled implementations of DAME, all of which have been integrated into MPICH. We evaluate their performance on DDTBench and two mini-applications written with MPI derived datatypes and obtain communication speedups of up to 20× and mini-application speedups of up to 3×.
KW - High Performance Computing
KW - Just In Time compilation
KW - MPI
KW - data movement
KW - derived datatypes
UR - http://www.scopus.com/inward/record.url?scp=85048476390&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85048476390&partnerID=8YFLogxK
U2 - 10.1177/1094342017695444
DO - 10.1177/1094342017695444
M3 - Article
AN - SCOPUS:85048476390
SN - 1094-3420
VL - 32
SP - 760
EP - 774
JO - International Journal of High Performance Computing Applications
JF - International Journal of High Performance Computing Applications
IS - 5
ER -