Parallel histogram: An introduction to atomic operations and privatization

Wen mei W. Hwu, David B. Kirk, Izzat El Hajj

This chapter introduces the parallel histogram computation pattern and the concept of atomic operations. It shows that atomic operations to the same location are serialized and that their throughput is determined by their latency. It further introduces four important optimization techniques: thread coarsening-based interleaved data partitioning for improved memory coalescing, caching for reduced latency and improved throughput of atomic operations, privatization for reduced contention, and aggregation for reduced contention.

Original languageEnglish (US)
Title of host publicationProgramming Massively Parallel Processors
Subtitle of host publicationa Hands-on Approach, Fourth Edition
Number of pages20
ISBN (Electronic)9780323912310
ISBN (Print)9780323984638
StatePublished - Jan 1 2022


  • Histogram
  • atomic operation
  • contiguous partitioning
  • feature extraction
  • interleaved partitioning
  • memory bound
  • memory coalescing
  • memory latency
  • memory throughput
  • output interference
  • race condition
  • read-modify-write

ASJC Scopus subject areas

  • General Computer Science


