CUBA: An architecture for efficient CPU/Co-processor data communication

Isaac Gelado, John H. Kelm, Shane Ryoo, Steven Sam Lumetta, Nacho Navarro, Wen-Mei W Hwu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Data-parallel co-processors have the potential to improve performance in highly parallel regions of code when coupled to a generalpurpose CPU. However, applications often have to be modified in non-intuitive and complicated ways to mitigate the cost of data marshalling between the CPU and the co-processor. In some applications the overheads cannot be amortized and co-processors are unable to provide benefit. The additional effort and complexity of incorporating co-processors makes it difficult, if not impossible, to effectively utilize co-processors in large applications. This paper presents CUBA, an architecture model where coprocessors encapsulated as function calls can efficiently access their input and output data structures through pointer parameters. The key idea is to map the data structures required by the co-processor to the co-processor local memory as opposed to the CPU's main memory. The mapping in CUBA preserves the original layout of the shared data structures hosted in the co-processor local memory. The mapping renders the data marshalling process unnecessary and reduces the need for code changes in order to use the co-processors. CUBA allows the CPU to cache hosted data structures with a selective write-through cache policy, allowing the CPU to access hosted data structures while supporting efficient communication with the co-processors. Benchmark simulation results show that a CUBA- based system can approach optimal transfer rates while requiring few changes to the code that executes on the CPU.

Original languageEnglish (US)
Title of host publicationICS'08 - Proceedings of the 2008 ACM International Conference on Supercomputing
Pages299-308
Number of pages10
DOIs
StatePublished - Dec 15 2008
Event22nd ACM International Conference on Supercomputing, ICS'08 - Island of Kos, Greece
Duration: Jun 7 2008Jun 12 2008

Publication series

NameProceedings of the International Conference on Supercomputing

Other

Other22nd ACM International Conference on Supercomputing, ICS'08
CountryGreece
CityIsland of Kos
Period6/7/086/12/08

Fingerprint

Program processors
Communication
Data structures
Data storage equipment
Coprocessor

Keywords

  • Design

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Gelado, I., Kelm, J. H., Ryoo, S., Lumetta, S. S., Navarro, N., & Hwu, W-M. W. (2008). CUBA: An architecture for efficient CPU/Co-processor data communication. In ICS'08 - Proceedings of the 2008 ACM International Conference on Supercomputing (pp. 299-308). (Proceedings of the International Conference on Supercomputing). https://doi.org/10.1145/1375527.1375571

CUBA : An architecture for efficient CPU/Co-processor data communication. / Gelado, Isaac; Kelm, John H.; Ryoo, Shane; Lumetta, Steven Sam; Navarro, Nacho; Hwu, Wen-Mei W.

ICS'08 - Proceedings of the 2008 ACM International Conference on Supercomputing. 2008. p. 299-308 (Proceedings of the International Conference on Supercomputing).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Gelado, I, Kelm, JH, Ryoo, S, Lumetta, SS, Navarro, N & Hwu, W-MW 2008, CUBA: An architecture for efficient CPU/Co-processor data communication. in ICS'08 - Proceedings of the 2008 ACM International Conference on Supercomputing. Proceedings of the International Conference on Supercomputing, pp. 299-308, 22nd ACM International Conference on Supercomputing, ICS'08, Island of Kos, Greece, 6/7/08. https://doi.org/10.1145/1375527.1375571
Gelado I, Kelm JH, Ryoo S, Lumetta SS, Navarro N, Hwu W-MW. CUBA: An architecture for efficient CPU/Co-processor data communication. In ICS'08 - Proceedings of the 2008 ACM International Conference on Supercomputing. 2008. p. 299-308. (Proceedings of the International Conference on Supercomputing). https://doi.org/10.1145/1375527.1375571
Gelado, Isaac ; Kelm, John H. ; Ryoo, Shane ; Lumetta, Steven Sam ; Navarro, Nacho ; Hwu, Wen-Mei W. / CUBA : An architecture for efficient CPU/Co-processor data communication. ICS'08 - Proceedings of the 2008 ACM International Conference on Supercomputing. 2008. pp. 299-308 (Proceedings of the International Conference on Supercomputing).
@inproceedings{bdbb2ed6719549f99b15daeb852a54c7,
title = "CUBA: An architecture for efficient CPU/Co-processor data communication",
abstract = "Data-parallel co-processors have the potential to improve performance in highly parallel regions of code when coupled to a generalpurpose CPU. However, applications often have to be modified in non-intuitive and complicated ways to mitigate the cost of data marshalling between the CPU and the co-processor. In some applications the overheads cannot be amortized and co-processors are unable to provide benefit. The additional effort and complexity of incorporating co-processors makes it difficult, if not impossible, to effectively utilize co-processors in large applications. This paper presents CUBA, an architecture model where coprocessors encapsulated as function calls can efficiently access their input and output data structures through pointer parameters. The key idea is to map the data structures required by the co-processor to the co-processor local memory as opposed to the CPU's main memory. The mapping in CUBA preserves the original layout of the shared data structures hosted in the co-processor local memory. The mapping renders the data marshalling process unnecessary and reduces the need for code changes in order to use the co-processors. CUBA allows the CPU to cache hosted data structures with a selective write-through cache policy, allowing the CPU to access hosted data structures while supporting efficient communication with the co-processors. Benchmark simulation results show that a CUBA- based system can approach optimal transfer rates while requiring few changes to the code that executes on the CPU.",
keywords = "Design",
author = "Isaac Gelado and Kelm, {John H.} and Shane Ryoo and Lumetta, {Steven Sam} and Nacho Navarro and Hwu, {Wen-Mei W}",
year = "2008",
month = "12",
day = "15",
doi = "10.1145/1375527.1375571",
language = "English (US)",
isbn = "9781605581583",
series = "Proceedings of the International Conference on Supercomputing",
pages = "299--308",
booktitle = "ICS'08 - Proceedings of the 2008 ACM International Conference on Supercomputing",

}

TY - GEN

T1 - CUBA

T2 - An architecture for efficient CPU/Co-processor data communication

AU - Gelado, Isaac

AU - Kelm, John H.

AU - Ryoo, Shane

AU - Lumetta, Steven Sam

AU - Navarro, Nacho

AU - Hwu, Wen-Mei W

PY - 2008/12/15

Y1 - 2008/12/15

N2 - Data-parallel co-processors have the potential to improve performance in highly parallel regions of code when coupled to a generalpurpose CPU. However, applications often have to be modified in non-intuitive and complicated ways to mitigate the cost of data marshalling between the CPU and the co-processor. In some applications the overheads cannot be amortized and co-processors are unable to provide benefit. The additional effort and complexity of incorporating co-processors makes it difficult, if not impossible, to effectively utilize co-processors in large applications. This paper presents CUBA, an architecture model where coprocessors encapsulated as function calls can efficiently access their input and output data structures through pointer parameters. The key idea is to map the data structures required by the co-processor to the co-processor local memory as opposed to the CPU's main memory. The mapping in CUBA preserves the original layout of the shared data structures hosted in the co-processor local memory. The mapping renders the data marshalling process unnecessary and reduces the need for code changes in order to use the co-processors. CUBA allows the CPU to cache hosted data structures with a selective write-through cache policy, allowing the CPU to access hosted data structures while supporting efficient communication with the co-processors. Benchmark simulation results show that a CUBA- based system can approach optimal transfer rates while requiring few changes to the code that executes on the CPU.

AB - Data-parallel co-processors have the potential to improve performance in highly parallel regions of code when coupled to a generalpurpose CPU. However, applications often have to be modified in non-intuitive and complicated ways to mitigate the cost of data marshalling between the CPU and the co-processor. In some applications the overheads cannot be amortized and co-processors are unable to provide benefit. The additional effort and complexity of incorporating co-processors makes it difficult, if not impossible, to effectively utilize co-processors in large applications. This paper presents CUBA, an architecture model where coprocessors encapsulated as function calls can efficiently access their input and output data structures through pointer parameters. The key idea is to map the data structures required by the co-processor to the co-processor local memory as opposed to the CPU's main memory. The mapping in CUBA preserves the original layout of the shared data structures hosted in the co-processor local memory. The mapping renders the data marshalling process unnecessary and reduces the need for code changes in order to use the co-processors. CUBA allows the CPU to cache hosted data structures with a selective write-through cache policy, allowing the CPU to access hosted data structures while supporting efficient communication with the co-processors. Benchmark simulation results show that a CUBA- based system can approach optimal transfer rates while requiring few changes to the code that executes on the CPU.

KW - Design

UR - http://www.scopus.com/inward/record.url?scp=57349092386&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=57349092386&partnerID=8YFLogxK

U2 - 10.1145/1375527.1375571

DO - 10.1145/1375527.1375571

M3 - Conference contribution

AN - SCOPUS:57349092386

SN - 9781605581583

T3 - Proceedings of the International Conference on Supercomputing

SP - 299

EP - 308

BT - ICS'08 - Proceedings of the 2008 ACM International Conference on Supercomputing

ER -