Skip to main navigation Skip to search Skip to main content

Inter-kernel Reuse-aware Thread Block Scheduling

  • Muhammad Huzaifa
  • , Johnathan Alsop
  • , Abdulrahman Mahmoud
  • , Giordano Salvador
  • , Matthew D. Sinclair
  • , Sarita V. Adve

Research output: Contribution to journalArticlepeer-review

Abstract

As GPUs have become more programmable, their performance and energy benefits have made them increasingly popular. However, while GPU compute units continue to improve in performance, on-chip memories lag behind and data accesses are becoming increasingly expensive in performance and energy. Emerging GPU coherence protocols can mitigate this bottleneck by exploiting data reuse in GPU caches across kernel boundaries. Unfortunately, current GPU thread block schedulers are typically not designed to expose such reuse. This article proposes new hardware thread block schedulers that optimize inter-kernel reuse while using work stealing to preserve load balance. Our schedulers are simple, decentralized, and have extremely low overhead. Compared to a baseline round-robin scheduler, the best performing scheduler reduces average execution time and energy by 19% and 11%, respectively, in regular applications, and 10% and 8%, respectively, in irregular applications.

Original languageEnglish (US)
Article number3406538
JournalACM Transactions on Architecture and Code Optimization
Volume17
Issue number3
DOIs
StatePublished - Aug 2020

Keywords

  • GPUs
  • caches
  • memory systems
  • scheduling

ASJC Scopus subject areas

  • Software
  • Information Systems
  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'Inter-kernel Reuse-aware Thread Block Scheduling'. Together they form a unique fingerprint.

Cite this