Compute architecture and scheduling

Wen mei W. Hwu, David B. Kirk, Izzat El Hajj

Research output: Chapter in Book/Report/Conference proceedingChapter


This chapter introduces key concepts in the compute architectures of modern GPUs that are important to CUDA C programmers. It first gives an overview of the GPU execution resources, such as streaming multiprocessors (SMs). It then discusses how the blocks are assigned to SMs and divided into warps for scheduling purposes. It then gives more details about single-instruction, multiple-data execution hardware, warp scheduling, latency tolerance, control divergence, and effects of resource limitations. The chapter concludes with an introduction to the concept of resource queries.

Original languageEnglish (US)
Title of host publicationProgramming Massively Parallel Processors
Subtitle of host publicationa Hands-on Approach, Fourth Edition
Number of pages24
ISBN (Electronic)9780323912310
ISBN (Print)9780323984638
StatePublished - Jan 1 2022


  • Thread scheduling
  • barrier synchronization
  • control divergence
  • deadlock
  • device property query
  • dynamic resource partitioning
  • latency tolerance
  • linear layout of threads
  • occupancy
  • streaming multiprocessors
  • transparent scalability
  • warp scheduling
  • zero-overhead thread scheduling

ASJC Scopus subject areas

  • General Computer Science


Dive into the research topics of 'Compute architecture and scheduling'. Together they form a unique fingerprint.

Cite this