Tolerating data access latency with register preloading

William Y. Chen, Scott A. Mahlke, Wen-Mei W Hwu, Tokuzo Kiyohara, Pohua P. Chang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

By exploiting fine grain parallelism, superscalar processors can potentially increase the performance of future supercomputers. However, supercomputers typically have a long access delay to their first level memory which can severely restrict the performance of superscalar processors. Compilers attempt to move load instructions far enough ahead to hide this latency. However, conventional movement of load instructions is limited by data dependence analysis. This paper introduces a simple hardware scheme, referred to as preload register update, to allow the compiler to move load instructions even in the presence of inconclusive data dependence analysis results. Preload register update keeps the load destination registers coherent when load instructions are moved past store instructions that reference the same location. With this addition, superscalar processors can more effectively tolerate longer data access latencies.

Original languageEnglish (US)
Title of host publicationProceedings of the 6th International Conference on Supercomputing, ICS 1992
PublisherAssociation for Computing Machinery
Pages104-113
Number of pages10
ISBN (Electronic)0897914856
DOIs
StatePublished - Aug 1 1992
Event6th International Conference on Supercomputing, ICS 1992 - Washington, United States
Duration: Jul 19 1992Jul 24 1992

Publication series

NameProceedings of the International Conference on Supercomputing
VolumePart F129617

Other

Other6th International Conference on Supercomputing, ICS 1992
CountryUnited States
CityWashington
Period7/19/927/24/92

Fingerprint

Supercomputers
Hardware
Data storage equipment

Keywords

  • Data dependence analysis
  • Load latency
  • Register file
  • Register preload
  • VLIW/superscalar processor

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Chen, W. Y., Mahlke, S. A., Hwu, W-M. W., Kiyohara, T., & Chang, P. P. (1992). Tolerating data access latency with register preloading. In Proceedings of the 6th International Conference on Supercomputing, ICS 1992 (pp. 104-113). (Proceedings of the International Conference on Supercomputing; Vol. Part F129617). Association for Computing Machinery. https://doi.org/10.1145/143369.143394

Tolerating data access latency with register preloading. / Chen, William Y.; Mahlke, Scott A.; Hwu, Wen-Mei W; Kiyohara, Tokuzo; Chang, Pohua P.

Proceedings of the 6th International Conference on Supercomputing, ICS 1992. Association for Computing Machinery, 1992. p. 104-113 (Proceedings of the International Conference on Supercomputing; Vol. Part F129617).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Chen, WY, Mahlke, SA, Hwu, W-MW, Kiyohara, T & Chang, PP 1992, Tolerating data access latency with register preloading. in Proceedings of the 6th International Conference on Supercomputing, ICS 1992. Proceedings of the International Conference on Supercomputing, vol. Part F129617, Association for Computing Machinery, pp. 104-113, 6th International Conference on Supercomputing, ICS 1992, Washington, United States, 7/19/92. https://doi.org/10.1145/143369.143394
Chen WY, Mahlke SA, Hwu W-MW, Kiyohara T, Chang PP. Tolerating data access latency with register preloading. In Proceedings of the 6th International Conference on Supercomputing, ICS 1992. Association for Computing Machinery. 1992. p. 104-113. (Proceedings of the International Conference on Supercomputing). https://doi.org/10.1145/143369.143394
Chen, William Y. ; Mahlke, Scott A. ; Hwu, Wen-Mei W ; Kiyohara, Tokuzo ; Chang, Pohua P. / Tolerating data access latency with register preloading. Proceedings of the 6th International Conference on Supercomputing, ICS 1992. Association for Computing Machinery, 1992. pp. 104-113 (Proceedings of the International Conference on Supercomputing).
@inproceedings{7ac99613e7b64c7ba9dfc408dd5a169b,
title = "Tolerating data access latency with register preloading",
abstract = "By exploiting fine grain parallelism, superscalar processors can potentially increase the performance of future supercomputers. However, supercomputers typically have a long access delay to their first level memory which can severely restrict the performance of superscalar processors. Compilers attempt to move load instructions far enough ahead to hide this latency. However, conventional movement of load instructions is limited by data dependence analysis. This paper introduces a simple hardware scheme, referred to as preload register update, to allow the compiler to move load instructions even in the presence of inconclusive data dependence analysis results. Preload register update keeps the load destination registers coherent when load instructions are moved past store instructions that reference the same location. With this addition, superscalar processors can more effectively tolerate longer data access latencies.",
keywords = "Data dependence analysis, Load latency, Register file, Register preload, VLIW/superscalar processor",
author = "Chen, {William Y.} and Mahlke, {Scott A.} and Hwu, {Wen-Mei W} and Tokuzo Kiyohara and Chang, {Pohua P.}",
year = "1992",
month = "8",
day = "1",
doi = "10.1145/143369.143394",
language = "English (US)",
series = "Proceedings of the International Conference on Supercomputing",
publisher = "Association for Computing Machinery",
pages = "104--113",
booktitle = "Proceedings of the 6th International Conference on Supercomputing, ICS 1992",

}

TY - GEN

T1 - Tolerating data access latency with register preloading

AU - Chen, William Y.

AU - Mahlke, Scott A.

AU - Hwu, Wen-Mei W

AU - Kiyohara, Tokuzo

AU - Chang, Pohua P.

PY - 1992/8/1

Y1 - 1992/8/1

N2 - By exploiting fine grain parallelism, superscalar processors can potentially increase the performance of future supercomputers. However, supercomputers typically have a long access delay to their first level memory which can severely restrict the performance of superscalar processors. Compilers attempt to move load instructions far enough ahead to hide this latency. However, conventional movement of load instructions is limited by data dependence analysis. This paper introduces a simple hardware scheme, referred to as preload register update, to allow the compiler to move load instructions even in the presence of inconclusive data dependence analysis results. Preload register update keeps the load destination registers coherent when load instructions are moved past store instructions that reference the same location. With this addition, superscalar processors can more effectively tolerate longer data access latencies.

AB - By exploiting fine grain parallelism, superscalar processors can potentially increase the performance of future supercomputers. However, supercomputers typically have a long access delay to their first level memory which can severely restrict the performance of superscalar processors. Compilers attempt to move load instructions far enough ahead to hide this latency. However, conventional movement of load instructions is limited by data dependence analysis. This paper introduces a simple hardware scheme, referred to as preload register update, to allow the compiler to move load instructions even in the presence of inconclusive data dependence analysis results. Preload register update keeps the load destination registers coherent when load instructions are moved past store instructions that reference the same location. With this addition, superscalar processors can more effectively tolerate longer data access latencies.

KW - Data dependence analysis

KW - Load latency

KW - Register file

KW - Register preload

KW - VLIW/superscalar processor

UR - http://www.scopus.com/inward/record.url?scp=33646901785&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33646901785&partnerID=8YFLogxK

U2 - 10.1145/143369.143394

DO - 10.1145/143369.143394

M3 - Conference contribution

AN - SCOPUS:33646901785

T3 - Proceedings of the International Conference on Supercomputing

SP - 104

EP - 113

BT - Proceedings of the 6th International Conference on Supercomputing, ICS 1992

PB - Association for Computing Machinery

ER -