Compute architecture and scheduling

Wen mei W. Hwu, David B. Kirk, Izzat El Hajj

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

This chapter introduces key concepts in the compute architectures of modern GPUs that are important to CUDA C programmers. It first gives an overview of the GPU execution resources, such as streaming multiprocessors (SMs). It then discusses how the blocks are assigned to SMs and divided into warps for scheduling purposes. It then gives more details about single-instruction, multiple-data execution hardware, warp scheduling, latency tolerance, control divergence, and effects of resource limitations. The chapter concludes with an introduction to the concept of resource queries.

Original languageEnglish (US)
Title of host publicationProgramming Massively Parallel Processors
Subtitle of host publicationa Hands-on Approach, Fourth Edition
PublisherElsevier
Pages69-92
Number of pages24
ISBN (Electronic)9780323912310
ISBN (Print)9780323984638
DOIs
StatePublished - Jan 1 2022

Keywords

  • Thread scheduling
  • barrier synchronization
  • control divergence
  • deadlock
  • device property query
  • dynamic resource partitioning
  • latency tolerance
  • linear layout of threads
  • occupancy
  • streaming multiprocessors
  • transparent scalability
  • warp scheduling
  • zero-overhead thread scheduling

ASJC Scopus subject areas

  • General Computer Science

Fingerprint

Dive into the research topics of 'Compute architecture and scheduling'. Together they form a unique fingerprint.

Cite this