Data layout transformation exploiting memory-level parallelism in structured grid many-core applications

I. Jui Sung, Nasser Anssari, John A. Stratton, Wen Mei W. Hwu

Research output: Contribution to journalArticle

Abstract

We present automatic data layout transformation as an effective compiler performance optimization for memory-bound structured grid applications. Structured grid applications include stencil codes and other code structures using a dense, regular grid as the primary data structure. Fluid dynamics and heat distribution, which both solve partial differential equations on a discretized representation of space, are representative of many important structured grid applications. Using the information available through variable-length array syntax, standardized in C99 and other modern languages, we enable automatic data layout transformations for structured grid codes with dynamically allocated arrays. We also present how a tool can guide these transformations to statically choose a good layout given a model of the memory system, using a modern GPU as an example. A transformed layout that distributes concurrent memory requests among parallel memory system components provides substantial speedup for structured grid applications by improving their achieved memory-level parallelism. Even with the overhead of more complex address calculations, we observe up to 10.94X speedup over the original layout, and a 1.16X performance gain in the worst case.

Original languageEnglish (US)
Pages (from-to)4-24
Number of pages21
JournalInternational Journal of Parallel Programming
Volume40
Issue number1
DOIs
StatePublished - Feb 1 2012

Keywords

  • Data layout transformation
  • GPU
  • Parallel programming

ASJC Scopus subject areas

  • Software
  • Theoretical Computer Science
  • Information Systems

Cite this