Data layout transformation exploiting memory-level parallelism in structured grid many-core applications

I. Jui Sung, John A. Stratton, Wen Mei W. Hwu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We present automatic data layout transformation as an effective compiler performance optimization for memory-bound structured grid applications. Structured grid applications include stencil codes and other code structures using a dense, regular grid as the primary data structure. Fluid dynamics and heat distribution, which both solve partial differential equations on a discretized representation of space, are representative of many important structured grid applications. Using the information available through variable-length array syntax, standardized in C99 and other modern languages, we have enabled automatic data layout transformations for structured grid codes with dynamically allocated arrays. We also present how a tool can guide these transformations to statically choose a good layout given a model of the memory system, using a modern GPU as an example. A transformed layout that distributes concurrent memory requests among parallel memory system components provides substantial speedup for structured grid applications by improving their achieved memory-level parallelism. Even with the overhead of more complex address calculations, we observe up to 560% performance increases over the language-defined layout, and a 7% performance gain in the worst case, in which the language-defined layout and access pattern is already well-vectorizable by the underlying hardware.

Original languageEnglish (US)
Title of host publicationPACT'10 - Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages513-522
Number of pages10
ISBN (Print)9781450301787
DOIs
StatePublished - Jan 1 2010

Publication series

NameParallel Architectures and Compilation Techniques - Conference Proceedings, PACT
Volume2010
ISSN (Print)1089-795X

Fingerprint

Many-core
Parallelism
Layout
Grid
Data storage equipment
Compiler Optimization
Fluid dynamics
Performance Optimization
Partial differential equations
Data structures
Fluid Dynamics
Computer systems
Concurrent
Data Structures
Speedup
Hardware
Partial differential equation
Heat
Choose
Language

Keywords

  • GPU
  • data layout transformation
  • parallel programming

ASJC Scopus subject areas

  • Software
  • Theoretical Computer Science
  • Hardware and Architecture

Cite this

Sung, I. J., Stratton, J. A., & Hwu, W. M. W. (2010). Data layout transformation exploiting memory-level parallelism in structured grid many-core applications. In PACT'10 - Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (pp. 513-522). (Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT; Vol. 2010). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1145/1854273.1854336

Data layout transformation exploiting memory-level parallelism in structured grid many-core applications. / Sung, I. Jui; Stratton, John A.; Hwu, Wen Mei W.

PACT'10 - Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques. Institute of Electrical and Electronics Engineers Inc., 2010. p. 513-522 (Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT; Vol. 2010).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Sung, IJ, Stratton, JA & Hwu, WMW 2010, Data layout transformation exploiting memory-level parallelism in structured grid many-core applications. in PACT'10 - Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques. Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT, vol. 2010, Institute of Electrical and Electronics Engineers Inc., pp. 513-522. https://doi.org/10.1145/1854273.1854336
Sung IJ, Stratton JA, Hwu WMW. Data layout transformation exploiting memory-level parallelism in structured grid many-core applications. In PACT'10 - Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques. Institute of Electrical and Electronics Engineers Inc. 2010. p. 513-522. (Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT). https://doi.org/10.1145/1854273.1854336
Sung, I. Jui ; Stratton, John A. ; Hwu, Wen Mei W. / Data layout transformation exploiting memory-level parallelism in structured grid many-core applications. PACT'10 - Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques. Institute of Electrical and Electronics Engineers Inc., 2010. pp. 513-522 (Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT).
@inproceedings{b1ca6e6b153042f8bbe4e3bf657b6a69,
title = "Data layout transformation exploiting memory-level parallelism in structured grid many-core applications",
abstract = "We present automatic data layout transformation as an effective compiler performance optimization for memory-bound structured grid applications. Structured grid applications include stencil codes and other code structures using a dense, regular grid as the primary data structure. Fluid dynamics and heat distribution, which both solve partial differential equations on a discretized representation of space, are representative of many important structured grid applications. Using the information available through variable-length array syntax, standardized in C99 and other modern languages, we have enabled automatic data layout transformations for structured grid codes with dynamically allocated arrays. We also present how a tool can guide these transformations to statically choose a good layout given a model of the memory system, using a modern GPU as an example. A transformed layout that distributes concurrent memory requests among parallel memory system components provides substantial speedup for structured grid applications by improving their achieved memory-level parallelism. Even with the overhead of more complex address calculations, we observe up to 560{\%} performance increases over the language-defined layout, and a 7{\%} performance gain in the worst case, in which the language-defined layout and access pattern is already well-vectorizable by the underlying hardware.",
keywords = "GPU, data layout transformation, parallel programming",
author = "Sung, {I. Jui} and Stratton, {John A.} and Hwu, {Wen Mei W.}",
year = "2010",
month = "1",
day = "1",
doi = "10.1145/1854273.1854336",
language = "English (US)",
isbn = "9781450301787",
series = "Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "513--522",
booktitle = "PACT'10 - Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques",
address = "United States",

}

TY - GEN

T1 - Data layout transformation exploiting memory-level parallelism in structured grid many-core applications

AU - Sung, I. Jui

AU - Stratton, John A.

AU - Hwu, Wen Mei W.

PY - 2010/1/1

Y1 - 2010/1/1

N2 - We present automatic data layout transformation as an effective compiler performance optimization for memory-bound structured grid applications. Structured grid applications include stencil codes and other code structures using a dense, regular grid as the primary data structure. Fluid dynamics and heat distribution, which both solve partial differential equations on a discretized representation of space, are representative of many important structured grid applications. Using the information available through variable-length array syntax, standardized in C99 and other modern languages, we have enabled automatic data layout transformations for structured grid codes with dynamically allocated arrays. We also present how a tool can guide these transformations to statically choose a good layout given a model of the memory system, using a modern GPU as an example. A transformed layout that distributes concurrent memory requests among parallel memory system components provides substantial speedup for structured grid applications by improving their achieved memory-level parallelism. Even with the overhead of more complex address calculations, we observe up to 560% performance increases over the language-defined layout, and a 7% performance gain in the worst case, in which the language-defined layout and access pattern is already well-vectorizable by the underlying hardware.

AB - We present automatic data layout transformation as an effective compiler performance optimization for memory-bound structured grid applications. Structured grid applications include stencil codes and other code structures using a dense, regular grid as the primary data structure. Fluid dynamics and heat distribution, which both solve partial differential equations on a discretized representation of space, are representative of many important structured grid applications. Using the information available through variable-length array syntax, standardized in C99 and other modern languages, we have enabled automatic data layout transformations for structured grid codes with dynamically allocated arrays. We also present how a tool can guide these transformations to statically choose a good layout given a model of the memory system, using a modern GPU as an example. A transformed layout that distributes concurrent memory requests among parallel memory system components provides substantial speedup for structured grid applications by improving their achieved memory-level parallelism. Even with the overhead of more complex address calculations, we observe up to 560% performance increases over the language-defined layout, and a 7% performance gain in the worst case, in which the language-defined layout and access pattern is already well-vectorizable by the underlying hardware.

KW - GPU

KW - data layout transformation

KW - parallel programming

UR - http://www.scopus.com/inward/record.url?scp=78149251414&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=78149251414&partnerID=8YFLogxK

U2 - 10.1145/1854273.1854336

DO - 10.1145/1854273.1854336

M3 - Conference contribution

AN - SCOPUS:78149251414

SN - 9781450301787

T3 - Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT

SP - 513

EP - 522

BT - PACT'10 - Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques

PB - Institute of Electrical and Electronics Engineers Inc.

ER -