Data layout transformation exploiting memory-level parallelism in structured grid many-core applications

I. Jui Sung, Nasser Anssari, John A. Stratton, Wen-Mei W Hwu

Research output: Contribution to journalArticle

Abstract

We present automatic data layout transformation as an effective compiler performance optimization for memory-bound structured grid applications. Structured grid applications include stencil codes and other code structures using a dense, regular grid as the primary data structure. Fluid dynamics and heat distribution, which both solve partial differential equations on a discretized representation of space, are representative of many important structured grid applications. Using the information available through variable-length array syntax, standardized in C99 and other modern languages, we enable automatic data layout transformations for structured grid codes with dynamically allocated arrays. We also present how a tool can guide these transformations to statically choose a good layout given a model of the memory system, using a modern GPU as an example. A transformed layout that distributes concurrent memory requests among parallel memory system components provides substantial speedup for structured grid applications by improving their achieved memory-level parallelism. Even with the overhead of more complex address calculations, we observe up to 10.94X speedup over the original layout, and a 1.16X performance gain in the worst case.

Original languageEnglish (US)
Pages (from-to)4-24
Number of pages21
JournalInternational Journal of Parallel Programming
Volume40
Issue number1
DOIs
StatePublished - Feb 1 2012

Fingerprint

Many-core
Parallelism
Layout
Grid
Data storage equipment
Speedup
Compiler Optimization
Fluid dynamics
Performance Optimization
Partial differential equations
Data structures
Fluid Dynamics
Computer systems
Concurrent
Data Structures
Partial differential equation
Heat
Choose

Keywords

  • Data layout transformation
  • GPU
  • Parallel programming

ASJC Scopus subject areas

  • Software
  • Theoretical Computer Science
  • Information Systems

Cite this

Data layout transformation exploiting memory-level parallelism in structured grid many-core applications. / Sung, I. Jui; Anssari, Nasser; Stratton, John A.; Hwu, Wen-Mei W.

In: International Journal of Parallel Programming, Vol. 40, No. 1, 01.02.2012, p. 4-24.

Research output: Contribution to journalArticle

@article{6563fd87c4ac401ba63fa6d219b4828c,
title = "Data layout transformation exploiting memory-level parallelism in structured grid many-core applications",
abstract = "We present automatic data layout transformation as an effective compiler performance optimization for memory-bound structured grid applications. Structured grid applications include stencil codes and other code structures using a dense, regular grid as the primary data structure. Fluid dynamics and heat distribution, which both solve partial differential equations on a discretized representation of space, are representative of many important structured grid applications. Using the information available through variable-length array syntax, standardized in C99 and other modern languages, we enable automatic data layout transformations for structured grid codes with dynamically allocated arrays. We also present how a tool can guide these transformations to statically choose a good layout given a model of the memory system, using a modern GPU as an example. A transformed layout that distributes concurrent memory requests among parallel memory system components provides substantial speedup for structured grid applications by improving their achieved memory-level parallelism. Even with the overhead of more complex address calculations, we observe up to 10.94X speedup over the original layout, and a 1.16X performance gain in the worst case.",
keywords = "Data layout transformation, GPU, Parallel programming",
author = "Sung, {I. Jui} and Nasser Anssari and Stratton, {John A.} and Hwu, {Wen-Mei W}",
year = "2012",
month = "2",
day = "1",
doi = "10.1007/s10766-011-0182-5",
language = "English (US)",
volume = "40",
pages = "4--24",
journal = "International Journal of Parallel Programming",
issn = "0885-7458",
publisher = "Springer New York",
number = "1",

}

TY - JOUR

T1 - Data layout transformation exploiting memory-level parallelism in structured grid many-core applications

AU - Sung, I. Jui

AU - Anssari, Nasser

AU - Stratton, John A.

AU - Hwu, Wen-Mei W

PY - 2012/2/1

Y1 - 2012/2/1

N2 - We present automatic data layout transformation as an effective compiler performance optimization for memory-bound structured grid applications. Structured grid applications include stencil codes and other code structures using a dense, regular grid as the primary data structure. Fluid dynamics and heat distribution, which both solve partial differential equations on a discretized representation of space, are representative of many important structured grid applications. Using the information available through variable-length array syntax, standardized in C99 and other modern languages, we enable automatic data layout transformations for structured grid codes with dynamically allocated arrays. We also present how a tool can guide these transformations to statically choose a good layout given a model of the memory system, using a modern GPU as an example. A transformed layout that distributes concurrent memory requests among parallel memory system components provides substantial speedup for structured grid applications by improving their achieved memory-level parallelism. Even with the overhead of more complex address calculations, we observe up to 10.94X speedup over the original layout, and a 1.16X performance gain in the worst case.

AB - We present automatic data layout transformation as an effective compiler performance optimization for memory-bound structured grid applications. Structured grid applications include stencil codes and other code structures using a dense, regular grid as the primary data structure. Fluid dynamics and heat distribution, which both solve partial differential equations on a discretized representation of space, are representative of many important structured grid applications. Using the information available through variable-length array syntax, standardized in C99 and other modern languages, we enable automatic data layout transformations for structured grid codes with dynamically allocated arrays. We also present how a tool can guide these transformations to statically choose a good layout given a model of the memory system, using a modern GPU as an example. A transformed layout that distributes concurrent memory requests among parallel memory system components provides substantial speedup for structured grid applications by improving their achieved memory-level parallelism. Even with the overhead of more complex address calculations, we observe up to 10.94X speedup over the original layout, and a 1.16X performance gain in the worst case.

KW - Data layout transformation

KW - GPU

KW - Parallel programming

UR - http://www.scopus.com/inward/record.url?scp=84856256277&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84856256277&partnerID=8YFLogxK

U2 - 10.1007/s10766-011-0182-5

DO - 10.1007/s10766-011-0182-5

M3 - Article

AN - SCOPUS:84856256277

VL - 40

SP - 4

EP - 24

JO - International Journal of Parallel Programming

JF - International Journal of Parallel Programming

SN - 0885-7458

IS - 1

ER -