Tolerating data access latency with register preloading

William Y. Chen, Scott A. Mahlke, Wen Mei W. Hwu, Tokuzo Kiyohara, Pohua P. Chang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

By exploiting fine grain parallelism, superscalar processors can potentially increase the performance of future supercomputers. However, supercomputers typically have a long access delay to their first level memory which can severely restrict the performance of superscalar processors. Compilers attempt to move load instructions far enough ahead to hide this latency. However, conventional movement of load instructions is limited by data dependence analysis. This paper introduces a simple hardware scheme, referred to as preload register update, to allow the compiler to move load instructions even in the presence of inconclusive data dependence analysis results. Preload register update keeps the load destination registers coherent when load instructions are moved past store instructions that reference the same location. With this addition, superscalar processors can more effectively tolerate longer data access latencies.

Original languageEnglish (US)
Title of host publicationProceedings of the 6th International Conference on Supercomputing, ICS 1992
PublisherAssociation for Computing Machinery
Pages104-113
Number of pages10
ISBN (Electronic)0897914856
DOIs
StatePublished - Aug 1 1992
Event6th International Conference on Supercomputing, ICS 1992 - Washington, United States
Duration: Jul 19 1992Jul 24 1992

Publication series

NameProceedings of the International Conference on Supercomputing
VolumePart F129617

Other

Other6th International Conference on Supercomputing, ICS 1992
Country/TerritoryUnited States
CityWashington
Period7/19/927/24/92

Keywords

  • Data dependence analysis
  • Load latency
  • Register file
  • Register preload
  • VLIW/superscalar processor

ASJC Scopus subject areas

  • Computer Science(all)

Fingerprint

Dive into the research topics of 'Tolerating data access latency with register preloading'. Together they form a unique fingerprint.

Cite this