NUMA-Aware Data-Transfer Measurements for Power/NVLink Multi-GPU Systems

Carl Pearson, I. Hsin Chung, Zehra Sura, Wen-Mei W Hwu, Jinjun Xiong

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

High-performance computing increasingly relies on heterogeneous systems with specialized hardware accelerators to improve application performance. For example, NVIDIA’s CUDA programming system and general-purpose GPUs have emerged as a widespread accelerator in HPC systems. This trend has exacerbated challenges of data placement as accelerators often have fast local memories to fuel their computational demands, but slower interconnects to feed those memories. Crucially, real-world data-transfer performance is strongly influenced not just by the underlying hardware, but by the capabilities of the programming systems. Understanding how application performance is affected by the logical communication exposed through abstractions, as well as the underlying system topology, is crucial for developing high-performance applications and architectures. This report presents initial data-transfer microbenchmark results from two POWER-based systems obtained during work towards developing an automated system performance characterization tool.

Original languageEnglish (US)
Title of host publicationHigh Performance Computing - ISC High Performance 2018 International Workshops, Revised Selected Papers
EditorsMichèle Weiland, Sadaf Alam, Rio Yokota, John Shalf
PublisherSpringer-Verlag
Pages448-454
Number of pages7
ISBN (Print)9783030024642
DOIs
StatePublished - Jan 1 2018
EventInternational Conference on High Performance Computing, ISC High Performance 2018 - Frankfurt, Germany
Duration: Jun 28 2018Jun 28 2018

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11203 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceInternational Conference on High Performance Computing, ISC High Performance 2018
CountryGermany
CityFrankfurt
Period6/28/186/28/18

Fingerprint

Data Transfer
Data transfer
Particle accelerators
Computer systems programming
Hardware
Data storage equipment
Accelerator
Programming
High Performance
Data Placement
Hardware Accelerator
Topology
Heterogeneous Systems
Interconnect
Communication
System Performance
Graphics processing unit
Computing

Keywords

  • Benchmark
  • CUDA
  • GPGPU
  • NVLink
  • Unified Memory

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Pearson, C., Chung, I. H., Sura, Z., Hwu, W-M. W., & Xiong, J. (2018). NUMA-Aware Data-Transfer Measurements for Power/NVLink Multi-GPU Systems. In M. Weiland, S. Alam, R. Yokota, & J. Shalf (Eds.), High Performance Computing - ISC High Performance 2018 International Workshops, Revised Selected Papers (pp. 448-454). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11203 LNCS). Springer-Verlag. https://doi.org/10.1007/978-3-030-02465-9_32

NUMA-Aware Data-Transfer Measurements for Power/NVLink Multi-GPU Systems. / Pearson, Carl; Chung, I. Hsin; Sura, Zehra; Hwu, Wen-Mei W; Xiong, Jinjun.

High Performance Computing - ISC High Performance 2018 International Workshops, Revised Selected Papers. ed. / Michèle Weiland; Sadaf Alam; Rio Yokota; John Shalf. Springer-Verlag, 2018. p. 448-454 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11203 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Pearson, C, Chung, IH, Sura, Z, Hwu, W-MW & Xiong, J 2018, NUMA-Aware Data-Transfer Measurements for Power/NVLink Multi-GPU Systems. in M Weiland, S Alam, R Yokota & J Shalf (eds), High Performance Computing - ISC High Performance 2018 International Workshops, Revised Selected Papers. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11203 LNCS, Springer-Verlag, pp. 448-454, International Conference on High Performance Computing, ISC High Performance 2018, Frankfurt, Germany, 6/28/18. https://doi.org/10.1007/978-3-030-02465-9_32
Pearson C, Chung IH, Sura Z, Hwu W-MW, Xiong J. NUMA-Aware Data-Transfer Measurements for Power/NVLink Multi-GPU Systems. In Weiland M, Alam S, Yokota R, Shalf J, editors, High Performance Computing - ISC High Performance 2018 International Workshops, Revised Selected Papers. Springer-Verlag. 2018. p. 448-454. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-030-02465-9_32
Pearson, Carl ; Chung, I. Hsin ; Sura, Zehra ; Hwu, Wen-Mei W ; Xiong, Jinjun. / NUMA-Aware Data-Transfer Measurements for Power/NVLink Multi-GPU Systems. High Performance Computing - ISC High Performance 2018 International Workshops, Revised Selected Papers. editor / Michèle Weiland ; Sadaf Alam ; Rio Yokota ; John Shalf. Springer-Verlag, 2018. pp. 448-454 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{d8c924d2a9b0464bbddcc14542f5e6e0,
title = "NUMA-Aware Data-Transfer Measurements for Power/NVLink Multi-GPU Systems",
abstract = "High-performance computing increasingly relies on heterogeneous systems with specialized hardware accelerators to improve application performance. For example, NVIDIA’s CUDA programming system and general-purpose GPUs have emerged as a widespread accelerator in HPC systems. This trend has exacerbated challenges of data placement as accelerators often have fast local memories to fuel their computational demands, but slower interconnects to feed those memories. Crucially, real-world data-transfer performance is strongly influenced not just by the underlying hardware, but by the capabilities of the programming systems. Understanding how application performance is affected by the logical communication exposed through abstractions, as well as the underlying system topology, is crucial for developing high-performance applications and architectures. This report presents initial data-transfer microbenchmark results from two POWER-based systems obtained during work towards developing an automated system performance characterization tool.",
keywords = "Benchmark, CUDA, GPGPU, NVLink, Unified Memory",
author = "Carl Pearson and Chung, {I. Hsin} and Zehra Sura and Hwu, {Wen-Mei W} and Jinjun Xiong",
year = "2018",
month = "1",
day = "1",
doi = "10.1007/978-3-030-02465-9_32",
language = "English (US)",
isbn = "9783030024642",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer-Verlag",
pages = "448--454",
editor = "Mich{\`e}le Weiland and Sadaf Alam and Rio Yokota and John Shalf",
booktitle = "High Performance Computing - ISC High Performance 2018 International Workshops, Revised Selected Papers",

}

TY - GEN

T1 - NUMA-Aware Data-Transfer Measurements for Power/NVLink Multi-GPU Systems

AU - Pearson, Carl

AU - Chung, I. Hsin

AU - Sura, Zehra

AU - Hwu, Wen-Mei W

AU - Xiong, Jinjun

PY - 2018/1/1

Y1 - 2018/1/1

N2 - High-performance computing increasingly relies on heterogeneous systems with specialized hardware accelerators to improve application performance. For example, NVIDIA’s CUDA programming system and general-purpose GPUs have emerged as a widespread accelerator in HPC systems. This trend has exacerbated challenges of data placement as accelerators often have fast local memories to fuel their computational demands, but slower interconnects to feed those memories. Crucially, real-world data-transfer performance is strongly influenced not just by the underlying hardware, but by the capabilities of the programming systems. Understanding how application performance is affected by the logical communication exposed through abstractions, as well as the underlying system topology, is crucial for developing high-performance applications and architectures. This report presents initial data-transfer microbenchmark results from two POWER-based systems obtained during work towards developing an automated system performance characterization tool.

AB - High-performance computing increasingly relies on heterogeneous systems with specialized hardware accelerators to improve application performance. For example, NVIDIA’s CUDA programming system and general-purpose GPUs have emerged as a widespread accelerator in HPC systems. This trend has exacerbated challenges of data placement as accelerators often have fast local memories to fuel their computational demands, but slower interconnects to feed those memories. Crucially, real-world data-transfer performance is strongly influenced not just by the underlying hardware, but by the capabilities of the programming systems. Understanding how application performance is affected by the logical communication exposed through abstractions, as well as the underlying system topology, is crucial for developing high-performance applications and architectures. This report presents initial data-transfer microbenchmark results from two POWER-based systems obtained during work towards developing an automated system performance characterization tool.

KW - Benchmark

KW - CUDA

KW - GPGPU

KW - NVLink

KW - Unified Memory

UR - http://www.scopus.com/inward/record.url?scp=85066126248&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85066126248&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-02465-9_32

DO - 10.1007/978-3-030-02465-9_32

M3 - Conference contribution

SN - 9783030024642

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 448

EP - 454

BT - High Performance Computing - ISC High Performance 2018 International Workshops, Revised Selected Papers

A2 - Weiland, Michèle

A2 - Alam, Sadaf

A2 - Yokota, Rio

A2 - Shalf, John

PB - Springer-Verlag

ER -