An adaptive algorithm selection framework for reduction parallelization

Research output: Contribution to journalArticlepeer-review

Abstract

Irregular and dynamic memory reference patterns can cause performance variations for low level algorithms in general and for parallel algorithms in particular. In this paper, we present an adaptive algorithm selection framework which can collect and interpret the characteristics of a particular instance of parallel reduction algorithms and select the best performing one from an existing library. The framework consists of the following components: 1) an offline systematic process for characterizing the input sensitivity of parallel reduction algorithms and a method for building corresponding predictive performance models, 2) an online input characterization and algorithm selection module, and 3) a small library of parallel reduction algorithms, which represent the algorithmic choices made available at runtime. We also present one possible integration of this framework in a restructuring compiler. We validate our design experimentally and show that our framework 1) selects the most appropriate algorithms in 85 percent of the cases studied, 2) overall, delivers 98 percent of the optimal performance, 3) adaptively selects the best algorithms for dynamic phases of a running program (resulting in performance improvements otherwise not possible), and 4) adapts to the underlying machine architectures (evaluated on IBM Regatta and HP V-Class systems).

Original languageEnglish (US)
Pages (from-to)1084-1095
Number of pages12
JournalIEEE Transactions on Parallel and Distributed Systems
Volume17
Issue number10
DOIs
StatePublished - Oct 2006
Externally publishedYes

Keywords

  • Adaptive optimization
  • Compiler optimization
  • Reduction parallelization
  • Runtime parallelization

ASJC Scopus subject areas

  • Signal Processing
  • Hardware and Architecture
  • Computational Theory and Mathematics

Fingerprint

Dive into the research topics of 'An adaptive algorithm selection framework for reduction parallelization'. Together they form a unique fingerprint.

Cite this