Abstract
This chapter introduces the parallel reduction pattern that plays an important role in many data-processing applications. Reduction operators that are associative and commutative allow the reduction computation to be parallelized into a reduction tree and optimized aggressively with several optimization techniques, such as thread index assignment for reduced control and memory divergence, using shared memory for reduced global memory accesses, thread coarsening, and segmented reduction, that are needed to achieve high performance for large inputs.
Original language | English (US) |
---|---|
Title of host publication | Programming Massively Parallel Processors |
Subtitle of host publication | a Hands-on Approach, Fourth Edition |
Publisher | Elsevier |
Pages | 211-233 |
Number of pages | 23 |
ISBN (Electronic) | 9780323912310 |
ISBN (Print) | 9780323984638 |
DOIs | |
State | Published - Jan 1 2022 |
Keywords
- Reduction trees
- associative operators
- barrier synchronization
- commutative operators
- control divergence
- execution resource utilization efficiency
- identity value
- memory coalescing
- memory divergence
- segmented reduction
- speedup
- thread coarsening
- thread index to data index mapping
ASJC Scopus subject areas
- General Computer Science