Compiler-directed early load-address generation

Ben Chung Cheng, Daniel A. Connors, Wen-Mei W Hwu

Research output: Contribution to journalConference articlepeer-review


Two orthogonal hardware techniques, table-based address prediction and early address calculation, for reducing the latency of load instructions have been recently proposed. The key idea behind both of these techniques is to speculatively perform loads early in the processor pipeline using predicted values for the loads' addresses. These techniques have required either a large hardware table or complex register bypass logic to be implemented in order to accurately predict the important loads in the presence of a large number of less-important loads. This paper proposes a compiler-directed approach that allows a streamlined version of both of these techniques to be effectively used together. The compiler provides directives to indicate which prediction mechanism to use or, when appropriate, that a prediction should not be made. The hardware therefore can be focused on their target cases so that a smaller prediction table and simpler bypass logic suffice. Our results show that through straightforward compiler heuristics, we obtain an average speedup of 34% with a 256-entry direct-mapped address table and only one cached register. And with the help of address profiling, an extra 4% of speedup can be obtained.

Original languageEnglish (US)
Pages (from-to)138-147
Number of pages10
JournalProceedings of the Annual International Symposium on Microarchitecture
StatePublished - Dec 1 1998
EventProceedings of the 1998 31st Annual ACM/IEEE International Symposium on Microarchitecture - Dallas, TX, USA
Duration: Nov 30 1998Dec 2 1998

ASJC Scopus subject areas

  • Hardware and Architecture
  • Software

Fingerprint Dive into the research topics of 'Compiler-directed early load-address generation'. Together they form a unique fingerprint.

Cite this