Abstract
This chapter introduces key concepts in the compute architectures of modern GPUs that are important to CUDA C programmers. It first gives an overview of the GPU execution resources, such as streaming multiprocessors (SMs). It then discusses how the blocks are assigned to SMs and divided into warps for scheduling purposes. It then gives more details about single-instruction, multiple-data execution hardware, warp scheduling, latency tolerance, control divergence, and effects of resource limitations. The chapter concludes with an introduction to the concept of resource queries.
Original language | English (US) |
---|---|
Title of host publication | Programming Massively Parallel Processors |
Subtitle of host publication | a Hands-on Approach, Fourth Edition |
Publisher | Elsevier |
Pages | 69-92 |
Number of pages | 24 |
ISBN (Electronic) | 9780323912310 |
ISBN (Print) | 9780323984638 |
DOIs | |
State | Published - Jan 1 2022 |
Keywords
- Thread scheduling
- barrier synchronization
- control divergence
- deadlock
- device property query
- dynamic resource partitioning
- latency tolerance
- linear layout of threads
- occupancy
- streaming multiprocessors
- transparent scalability
- warp scheduling
- zero-overhead thread scheduling
ASJC Scopus subject areas
- General Computer Science