TY - JOUR
T1 - An adaptive algorithm selection framework for reduction parallelization
AU - Yu, Hao
AU - Rauchwerger, Lawrence
N1 - Funding Information:
This research was performed at Texas A&M University and supported in part by US National Science Foundation (NSF) Career Award CCR-9734471, NSF Grant ACI-9872126, NSF Grant EIA-0103742, NSF Grant ACI-0326350, NSF Grant ACI-0113971, and US Department of Energy ASCI ASAP Level 2 Grant B347886.
PY - 2006/10
Y1 - 2006/10
N2 - Irregular and dynamic memory reference patterns can cause performance variations for low level algorithms in general and for parallel algorithms in particular. In this paper, we present an adaptive algorithm selection framework which can collect and interpret the characteristics of a particular instance of parallel reduction algorithms and select the best performing one from an existing library. The framework consists of the following components: 1) an offline systematic process for characterizing the input sensitivity of parallel reduction algorithms and a method for building corresponding predictive performance models, 2) an online input characterization and algorithm selection module, and 3) a small library of parallel reduction algorithms, which represent the algorithmic choices made available at runtime. We also present one possible integration of this framework in a restructuring compiler. We validate our design experimentally and show that our framework 1) selects the most appropriate algorithms in 85 percent of the cases studied, 2) overall, delivers 98 percent of the optimal performance, 3) adaptively selects the best algorithms for dynamic phases of a running program (resulting in performance improvements otherwise not possible), and 4) adapts to the underlying machine architectures (evaluated on IBM Regatta and HP V-Class systems).
AB - Irregular and dynamic memory reference patterns can cause performance variations for low level algorithms in general and for parallel algorithms in particular. In this paper, we present an adaptive algorithm selection framework which can collect and interpret the characteristics of a particular instance of parallel reduction algorithms and select the best performing one from an existing library. The framework consists of the following components: 1) an offline systematic process for characterizing the input sensitivity of parallel reduction algorithms and a method for building corresponding predictive performance models, 2) an online input characterization and algorithm selection module, and 3) a small library of parallel reduction algorithms, which represent the algorithmic choices made available at runtime. We also present one possible integration of this framework in a restructuring compiler. We validate our design experimentally and show that our framework 1) selects the most appropriate algorithms in 85 percent of the cases studied, 2) overall, delivers 98 percent of the optimal performance, 3) adaptively selects the best algorithms for dynamic phases of a running program (resulting in performance improvements otherwise not possible), and 4) adapts to the underlying machine architectures (evaluated on IBM Regatta and HP V-Class systems).
KW - Adaptive optimization
KW - Compiler optimization
KW - Reduction parallelization
KW - Runtime parallelization
UR - http://www.scopus.com/inward/record.url?scp=33947389595&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33947389595&partnerID=8YFLogxK
U2 - 10.1109/TPDS.2006.131
DO - 10.1109/TPDS.2006.131
M3 - Article
AN - SCOPUS:33947389595
SN - 1045-9219
VL - 17
SP - 1084
EP - 1095
JO - IEEE Transactions on Parallel and Distributed Systems
JF - IEEE Transactions on Parallel and Distributed Systems
IS - 10
ER -