TY - JOUR
T1 - A Case for Fine-grain Coherence Specialization in Heterogeneous Systems
AU - Alsop, Johnathan
AU - Na, Weon Taek
AU - Sinclair, Matthew D.
AU - Grayson, Samuel
AU - Adve, Sarita
N1 - Publisher Copyright:
Copyright © 2022 held by the owner/author(s).
PY - 2022/8/22
Y1 - 2022/8/22
N2 - Hardware specialization is becoming a key enabler of energy-efficient performance. Future systems will be increasingly heterogeneous, integrating multiple specialized and programmable accelerators, each with different memory demands. Traditionally, communication between accelerators has been inefficient, typically orchestrated through explicit DMA transfers between different address spaces. More recently, industry has proposed unified coherent memory which enables implicit data movement and more data reuse, but often these interfaces limit the coherence flexibility available to heterogeneous systems. This paper demonstrates the benefits of fine-grained coherence specialization for heterogeneous systems. We propose an architecture that enables low-complexity independent specialization of each individual coherence request in heterogeneous workloads by building upon a simple and flexible baseline coherence interface, Spandex. We then describe how to optimize individual memory requests to improve cache reuse and performance-critical memory latency in emerging heterogeneous workloads. Collectively, our techniques enable significant gains, reducing execution time by up to 61% or network traffic by up to 99% while adding minimal complexity to the Spandex protocol.
AB - Hardware specialization is becoming a key enabler of energy-efficient performance. Future systems will be increasingly heterogeneous, integrating multiple specialized and programmable accelerators, each with different memory demands. Traditionally, communication between accelerators has been inefficient, typically orchestrated through explicit DMA transfers between different address spaces. More recently, industry has proposed unified coherent memory which enables implicit data movement and more data reuse, but often these interfaces limit the coherence flexibility available to heterogeneous systems. This paper demonstrates the benefits of fine-grained coherence specialization for heterogeneous systems. We propose an architecture that enables low-complexity independent specialization of each individual coherence request in heterogeneous workloads by building upon a simple and flexible baseline coherence interface, Spandex. We then describe how to optimize individual memory requests to improve cache reuse and performance-critical memory latency in emerging heterogeneous workloads. Collectively, our techniques enable significant gains, reducing execution time by up to 61% or network traffic by up to 99% while adding minimal complexity to the Spandex protocol.
KW - GPUs
KW - Shared memory systems
KW - caches
KW - coherence
UR - http://www.scopus.com/inward/record.url?scp=85139237014&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85139237014&partnerID=8YFLogxK
U2 - 10.1145/3530819
DO - 10.1145/3530819
M3 - Article
AN - SCOPUS:85139237014
SN - 1544-3566
VL - 19
JO - ACM Transactions on Architecture and Code Optimization
JF - ACM Transactions on Architecture and Code Optimization
IS - 3
M1 - 41
ER -