Abstract
Two orthogonal hardware techniques, table-based address prediction and early address calculation, for reducing the latency of load instructions have been recently proposed. The key idea behind both of these techniques is to speculatively perform loads early in the processor pipeline using predicted values for the loads' addresses. These techniques have required either a large hardware table or complex register bypass logic to be implemented in order to accurately predict the important loads in the presence of a large number of less-important loads. This paper proposes a compiler-directed approach that allows a streamlined version of both of these techniques to be effectively used together. The compiler provides directives to indicate which prediction mechanism to use or, when appropriate, that a prediction should not be made. The hardware therefore can be focused on their target cases so that a smaller prediction table and simpler bypass logic suffice. Our results show that through straightforward compiler heuristics, we obtain an average speedup of 34% with a 256-entry direct-mapped address table and only one cached register. And with the help of address profiling, an extra 4% of speedup can be obtained.
Original language | English (US) |
---|---|
Pages (from-to) | 138-147 |
Number of pages | 10 |
Journal | Proceedings of the Annual International Symposium on Microarchitecture |
State | Published - Dec 1 1998 |
Event | Proceedings of the 1998 31st Annual ACM/IEEE International Symposium on Microarchitecture - Dallas, TX, USA Duration: Nov 30 1998 → Dec 2 1998 |
ASJC Scopus subject areas
- Hardware and Architecture
- Software