TY - GEN
T1 - Tolerating data access latency with register preloading
AU - Chen, William Y.
AU - Mahlke, Scott A.
AU - Hwu, Wen Mei W.
AU - Kiyohara, Tokuzo
AU - Chang, Pohua P.
N1 - Funding Information:
The authors would like to acknowledge Bob Rau and Mike Schlansker at HP Labs, along with all members of the IMPACT research group for their comments and suggestions. Special thanks to the anonymous referees whose comments and suggestions helped to improve the quality of this paper significantly. This research has been supported by the Joint Services Engineering Programs (JSEP) under Contract NOO014-90-J-1270, Dr. Lee Hoevel at NCR, the AMD 29K Advanced Processor Development Division, Matsushita Electric Industrial Co. Ltd., Hewlett-Packard, and the National Aeronautics and Space Administration (NASA) under Contract NASA NAG 1-613 in cooperation with the Illinois Computer laboratory for Aerospace Systems and Software (ICLASS).
Publisher Copyright:
© 1992 ACM.
PY - 1992/8/1
Y1 - 1992/8/1
N2 - By exploiting fine grain parallelism, superscalar processors can potentially increase the performance of future supercomputers. However, supercomputers typically have a long access delay to their first level memory which can severely restrict the performance of superscalar processors. Compilers attempt to move load instructions far enough ahead to hide this latency. However, conventional movement of load instructions is limited by data dependence analysis. This paper introduces a simple hardware scheme, referred to as preload register update, to allow the compiler to move load instructions even in the presence of inconclusive data dependence analysis results. Preload register update keeps the load destination registers coherent when load instructions are moved past store instructions that reference the same location. With this addition, superscalar processors can more effectively tolerate longer data access latencies.
AB - By exploiting fine grain parallelism, superscalar processors can potentially increase the performance of future supercomputers. However, supercomputers typically have a long access delay to their first level memory which can severely restrict the performance of superscalar processors. Compilers attempt to move load instructions far enough ahead to hide this latency. However, conventional movement of load instructions is limited by data dependence analysis. This paper introduces a simple hardware scheme, referred to as preload register update, to allow the compiler to move load instructions even in the presence of inconclusive data dependence analysis results. Preload register update keeps the load destination registers coherent when load instructions are moved past store instructions that reference the same location. With this addition, superscalar processors can more effectively tolerate longer data access latencies.
KW - Data dependence analysis
KW - Load latency
KW - Register file
KW - Register preload
KW - VLIW/superscalar processor
UR - http://www.scopus.com/inward/record.url?scp=33646901785&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33646901785&partnerID=8YFLogxK
U2 - 10.1145/143369.143394
DO - 10.1145/143369.143394
M3 - Conference contribution
AN - SCOPUS:33646901785
T3 - Proceedings of the International Conference on Supercomputing
SP - 104
EP - 113
BT - Proceedings of the 6th International Conference on Supercomputing, ICS 1992
PB - Association for Computing Machinery
T2 - 6th International Conference on Supercomputing, ICS 1992
Y2 - 19 July 1992 through 24 July 1992
ER -