TY - GEN
T1 - CUBA
T2 - 22nd ACM International Conference on Supercomputing, ICS'08
AU - Gelado, Isaac
AU - Kelm, John H.
AU - Ryoo, Shane
AU - Lumetta, Steven S.
AU - Navarro, Nacho
AU - Hwu, Wen Mei W.
N1 - Copyright:
Copyright 2008 Elsevier B.V., All rights reserved.
PY - 2008
Y1 - 2008
N2 - Data-parallel co-processors have the potential to improve performance in highly parallel regions of code when coupled to a generalpurpose CPU. However, applications often have to be modified in non-intuitive and complicated ways to mitigate the cost of data marshalling between the CPU and the co-processor. In some applications the overheads cannot be amortized and co-processors are unable to provide benefit. The additional effort and complexity of incorporating co-processors makes it difficult, if not impossible, to effectively utilize co-processors in large applications. This paper presents CUBA, an architecture model where coprocessors encapsulated as function calls can efficiently access their input and output data structures through pointer parameters. The key idea is to map the data structures required by the co-processor to the co-processor local memory as opposed to the CPU's main memory. The mapping in CUBA preserves the original layout of the shared data structures hosted in the co-processor local memory. The mapping renders the data marshalling process unnecessary and reduces the need for code changes in order to use the co-processors. CUBA allows the CPU to cache hosted data structures with a selective write-through cache policy, allowing the CPU to access hosted data structures while supporting efficient communication with the co-processors. Benchmark simulation results show that a CUBA- based system can approach optimal transfer rates while requiring few changes to the code that executes on the CPU.
AB - Data-parallel co-processors have the potential to improve performance in highly parallel regions of code when coupled to a generalpurpose CPU. However, applications often have to be modified in non-intuitive and complicated ways to mitigate the cost of data marshalling between the CPU and the co-processor. In some applications the overheads cannot be amortized and co-processors are unable to provide benefit. The additional effort and complexity of incorporating co-processors makes it difficult, if not impossible, to effectively utilize co-processors in large applications. This paper presents CUBA, an architecture model where coprocessors encapsulated as function calls can efficiently access their input and output data structures through pointer parameters. The key idea is to map the data structures required by the co-processor to the co-processor local memory as opposed to the CPU's main memory. The mapping in CUBA preserves the original layout of the shared data structures hosted in the co-processor local memory. The mapping renders the data marshalling process unnecessary and reduces the need for code changes in order to use the co-processors. CUBA allows the CPU to cache hosted data structures with a selective write-through cache policy, allowing the CPU to access hosted data structures while supporting efficient communication with the co-processors. Benchmark simulation results show that a CUBA- based system can approach optimal transfer rates while requiring few changes to the code that executes on the CPU.
KW - Design
UR - http://www.scopus.com/inward/record.url?scp=57349092386&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=57349092386&partnerID=8YFLogxK
U2 - 10.1145/1375527.1375571
DO - 10.1145/1375527.1375571
M3 - Conference contribution
AN - SCOPUS:57349092386
SN - 9781605581583
T3 - Proceedings of the International Conference on Supercomputing
SP - 299
EP - 308
BT - ICS'08 - Proceedings of the 2008 ACM International Conference on Supercomputing
Y2 - 7 June 2008 through 12 June 2008
ER -