CoNDA: Efficient cache coherence support for near-data accelerators

Amirali Boroumand, Saugata Ghose, Minesh Patel, Hasan Hassan, Brandon Lucia, Rachata Ausavarungnirun, Kevin Hsieh, Nastaran Hajinazar, Krishna T. Malladi, Hongzhong Zheng, Onur Mutlu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Specialized on-chip accelerators are widely used to improve the energy efficiency of computing systems. Recent advances in memory technology have enabled near-data accelerators (NDAs), which reside off-chip close to main memory and can yield further benefits than on-chip accelerators. However, enforcing coherence with the rest of the system, which is already a major challenge for accelerators, becomes more difficult for NDAs. This is because (1) the cost of communication between NDAs and CPUs is high, and (2) NDA applications generate a lot of off-chip data movement. As a result, as we show in this work, existing coherence mechanisms eliminate most of the benefits of NDAs. We extensively analyze these mechanisms, and observe that (1) the majority of off-chip coherence traffic is unnecessary, and (2) much of the off-chip traffic can be eliminated if a coherence mechanism has insight into the memory accesses performed by the NDA. Based on our observations, we propose CoNDA, a coherence mechanism that lets an NDA optimistically execute an NDA kernel, under the assumption that the NDA has all necessary coherence permissions. This optimistic execution allows CoNDA to gather information on the memory accesses performed by the NDA and by the rest of the system. CoNDA exploits this information to avoid performing unnecessary coherence requests, and thus, significantly reduces data movement for coherence. We evaluate CoNDA using state-of-the-art graph processing and hybrid in-memory database workloads. Averaged across all of our workloads operating on modest data set sizes, CoNDA improves performance by 19.6% over the highest-performance prior coherence mechanism (66.0%/51.7% over a CPU-only/NDA-only system) and reduces memory system energy consumption by 18.0% over the most energy-efficient prior coherence mechanism (43.7% over CPU-only). CoNDA comes within 10.4% and 4.4% of the performance and energy of an ideal mechanism with no cost for coherence. The benefits of CoNDA increase with large data sets, as CoNDA improves performance over the highest-performance prior coherence mechanism by 38.3% (8.4x/7.7x over CPU-only/NDA-only), and comes within 10.2% of an ideal no-cost coherence mechanism.

Original languageEnglish (US)
Title of host publicationISCA 2019 - Proceedings of the 2019 46th International Symposium on Computer Architecture
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages629-642
Number of pages14
ISBN (Electronic)9781450366694
DOIs
StatePublished - Jun 22 2019
Externally publishedYes
Event46th International Symposium on Computer Architecture, ISCA 2019 - Phoenix, United States
Duration: Jun 22 2019Jun 26 2019

Publication series

NameProceedings - International Symposium on Computer Architecture
ISSN (Print)1063-6897

Conference

Conference46th International Symposium on Computer Architecture, ISCA 2019
Country/TerritoryUnited States
CityPhoenix
Period6/22/196/26/19

ASJC Scopus subject areas

  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'CoNDA: Efficient cache coherence support for near-data accelerators'. Together they form a unique fingerprint.

Cite this