Efficient GPU synchronization without scopes: Saying no to complex consistency models

Matthew D. Sinclair, Johnathan Alsop, Sarita V. Adve

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

As GPUs have become increasingly general purpose, applications with more general sharing patterns and fine- grained synchronization have started to emerge. Unfortunately, conventional GPU coherence protocols are fairly simplistic, with heavyweight requirements for synchronization accesses. Prior work has tried to resolve these inefficiencies by adding scoped synchronization to conventional GPU coherence protocols, but the resulting memory consistency model, heterogeneous-race-free (HRF), is more complex than the common data-race-free (DRF) model. This work applies the DeNovo coherence protocol to GPUs and compares it with conventional GPU coherence under the DRF and HRF consistency models. The results show that the complexity of the HRF model is neither necessary nor sufficient to obtain high performance. DeNovo with DRF provides a sweet spot in performance, energy, overhead, and memory consistency model complexity. Specifically, for benchmarks with globally scoped fine-grained synchronization, compared to conventional GPU with HRF (GPU+HRF), DeNovo+DRF provides 28% lower execution time and 51% lower energy on average. For benchmarks with mostly locally scoped fine-grained synchronization, GPU+HRF is slightly better - however, this advantage requires a more complex consistency model and is eliminated with a modest enhancement to DeNovo+DRF. Further, if HRF's complexity is deemed acceptable, then DeNovo+HRF is the best protocol.

Original languageEnglish (US)
Title of host publicationProceedings - 48th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2015
PublisherIEEE Computer Society
Pages647-659
Number of pages13
ISBN (Electronic)9781450340342
DOIs
StatePublished - Dec 5 2015
Event48th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2015 - Waikiki, United States
Duration: Dec 5 2015Dec 9 2015

Publication series

NameProceedings of the Annual International Symposium on Microarchitecture, MICRO
Volume05-09-December-2015
ISSN (Print)1072-4451

Other

Other48th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2015
Country/TerritoryUnited States
CityWaikiki
Period12/5/1512/9/15

Keywords

  • GPGPU
  • cache coherence
  • data-race-free models
  • memory consistency models
  • synchronization

ASJC Scopus subject areas

  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'Efficient GPU synchronization without scopes: Saying no to complex consistency models'. Together they form a unique fingerprint.

Cite this