Abstract
Modern machines consist of multiple compute devices and complex memory hierarchies. For many applications, it is imperative that any data movement between and within the various compute devices be done as efficiently as possible in order to obtain maximum performance. However, hand-optimizing code for one architecture will likely sacrifice both performance portability and software maintainability. In addition, some optimization decisions are best made at runtime. This suggests that the problem ought to be tackled on two fronts. First, provide the programmer with a declarative language to describe data layouts and data motion. This would allow the runtime system to be tuned for each architecture by a specialist and free the programmer to concentrate on the application itself. Second, exploit the execution time information to optimize the data movement code further. MPI derived datatypes accomplish the former task and Just In Time (JIT) compilation can be used for the latter. In this paper, we present DAME—a language and interpreter designed to be used as the backend for MPI derived datatypes. We also present DAME-L and DAME-X, two JIT-enabled implementations of DAME, all of which have been integrated into MPICH. We evaluate their performance on DDTBench and two mini-applications written with MPI derived datatypes and obtain communication speedups of up to 20× and mini-application speedups of up to 3×.
Original language | English (US) |
---|---|
Pages (from-to) | 760-774 |
Number of pages | 15 |
Journal | International Journal of High Performance Computing Applications |
Volume | 32 |
Issue number | 5 |
DOIs | |
State | Published - Sep 1 2018 |
Keywords
- High Performance Computing
- Just In Time compilation
- MPI
- data movement
- derived datatypes
ASJC Scopus subject areas
- Software
- Theoretical Computer Science
- Hardware and Architecture