TY - GEN
T1 - A task-centric memory model for scalable accelerator architectures
AU - Kelm, John H.
AU - Johnson, Daniel R.
AU - Lumetta, Steven Sam
AU - Frank, Matthew I.
AU - Patel, Sanjay Jeram
PY - 2009
Y1 - 2009
N2 - This paper presents a task-centric memory model for 1000-core compute accelerators. Visual computing applications are emerging as an important class of workloads that can exploit 1000-core processors. In these workloads, we observe data sharing and communication patterns that can be leveraged in the design of memory systems for future 1000-core processors. Based on these insights, we propose a memory model that uses a software protocol, working in collaboration with hardware caches, to maintain a coherent, single-address space view of memory without the need for hardware coherence support. We evaluate the task-centric memory model in simulation on a 1024-core MIMD accelerator we are developing that, with the help of a runtime system, implements the proposed memory model. We evaluate coherence management policies related to the task-centric memory model and show that the overhead of maintaining a coherent view of memory in software can be minimal. We further show that, while software management may constrain speculative hardware prefetching into local caches, a common optimization, it does not constrain the more relevant use case of off-chip prefetching from DRAM into shared caches.
AB - This paper presents a task-centric memory model for 1000-core compute accelerators. Visual computing applications are emerging as an important class of workloads that can exploit 1000-core processors. In these workloads, we observe data sharing and communication patterns that can be leveraged in the design of memory systems for future 1000-core processors. Based on these insights, we propose a memory model that uses a software protocol, working in collaboration with hardware caches, to maintain a coherent, single-address space view of memory without the need for hardware coherence support. We evaluate the task-centric memory model in simulation on a 1024-core MIMD accelerator we are developing that, with the help of a runtime system, implements the proposed memory model. We evaluate coherence management policies related to the task-centric memory model and show that the overhead of maintaining a coherent view of memory in software can be minimal. We further show that, while software management may constrain speculative hardware prefetching into local caches, a common optimization, it does not constrain the more relevant use case of off-chip prefetching from DRAM into shared caches.
UR - http://www.scopus.com/inward/record.url?scp=70449671562&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=70449671562&partnerID=8YFLogxK
U2 - 10.1109/PACT.2009.16
DO - 10.1109/PACT.2009.16
M3 - Conference contribution
AN - SCOPUS:70449671562
SN - 9780769537719
T3 - Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT
SP - 77
EP - 87
BT - Proceedings - 2009 18th International Conference on Parallel Architectures and Compilation Techniques, PACT 2009
T2 - 2009 18th International Conference on Parallel Architectures and Compilation Techniques, PACT 2009
Y2 - 12 September 2009 through 16 September 2009
ER -