Code transformations to improve memory parallelism

Vijay S. Pai, Sarita Adve

Research output: Contribution to journalConference articlepeer-review

Abstract

Current microprocessors incorporate techniques to exploit instruction-level parallelism (ILP). However, previous work has shown that these ILP techniques are less effective in removing memory stall time than CPU time, making the memory system a greater bottleneck in ILP-based systems than previous-generation systems. These deficiencies arise largely because applications present limited opportunities for an out-of-order issue processor to overlap multiple read misses, the dominant source of memory stalls. This work proposes code transformations to increase parallelism in the memory system by overlapping multiple read misses within the same instruction window, while preserving cache locality. We present an analysis and transformation framework suitable for compiler implementation. Our simulation experiments show substantial increases in memory parallelism, leading to execution time reductions averaging 23% in a multiprocessor and 30% in a uniprocessor. We see similar benefits on a Convex Exemplar.

Original languageEnglish (US)
Pages (from-to)147-155
Number of pages9
JournalProceedings of the Annual International Symposium on Microarchitecture
StatePublished - Dec 1 1999
Externally publishedYes
EventProceedings of the 1999 32nd Annual ACM/IEEE International Symposium on Microarchitecture, MICRO-32 - Haifa, Isr
Duration: Nov 16 1999Nov 18 1999

ASJC Scopus subject areas

  • Hardware and Architecture
  • Software

Fingerprint

Dive into the research topics of 'Code transformations to improve memory parallelism'. Together they form a unique fingerprint.

Cite this