Compiler code transformations for superscalar-based high-performance systems

Scott A. Mahlke, William Y. Chen, John C. Gyuenhaal, Wen-Mei W Hwu, Pohua P. Chang, Tokuzo Kiyohara

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Exploiting parallelism at both the multiprocessor level and the instruction level is an effective means for supercomputers to achieve high-performance. The amount of instruction-level parallelism available to superscalar or VLIW node processors can be limited, however, with conventional compiler optimization techniques. In this paper, a set of compiler transformations designed to increase instruction-level parallelism is described. The effectiveness of these transformations is evaluated using 40 loop nests extracted from a range of supercomputer applications. This evaluation shows that increasing execution resources in superscalar/VLIW node processors yields little performance improvement unless loop unrolling and register renaming are applied. It also reveals that these two transformations are sufficient for DOALL loops. However, more advanced transformations are required in order for serial and DOACROSS loops to fully benefit from the increased execution resources. The results show that the six additional transformations studied satisfy most of this need.

Original languageEnglish (US)
Title of host publicationProceedings of the 1992 ACM/IEEE conference on Supercomputing, Supercomputing 1992
EditorsRobert Werner
PublisherAssociation for Computing Machinery
Pages808-817
Number of pages10
ISBN (Electronic)0818626305
StatePublished - Dec 1 1992
Event1992 ACM/IEEE conference on Supercomputing, Supercomputing 1992 - Minneapolis, United States
Duration: Nov 16 1992Nov 20 1992

Publication series

NameProceedings of the International Conference on Supercomputing
VolumePart F129723

Other

Other1992 ACM/IEEE conference on Supercomputing, Supercomputing 1992
CountryUnited States
CityMinneapolis
Period11/16/9211/20/92

Fingerprint

Supercomputers

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Mahlke, S. A., Chen, W. Y., Gyuenhaal, J. C., Hwu, W-M. W., Chang, P. P., & Kiyohara, T. (1992). Compiler code transformations for superscalar-based high-performance systems. In R. Werner (Ed.), Proceedings of the 1992 ACM/IEEE conference on Supercomputing, Supercomputing 1992 (pp. 808-817). (Proceedings of the International Conference on Supercomputing; Vol. Part F129723). Association for Computing Machinery.

Compiler code transformations for superscalar-based high-performance systems. / Mahlke, Scott A.; Chen, William Y.; Gyuenhaal, John C.; Hwu, Wen-Mei W; Chang, Pohua P.; Kiyohara, Tokuzo.

Proceedings of the 1992 ACM/IEEE conference on Supercomputing, Supercomputing 1992. ed. / Robert Werner. Association for Computing Machinery, 1992. p. 808-817 (Proceedings of the International Conference on Supercomputing; Vol. Part F129723).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Mahlke, SA, Chen, WY, Gyuenhaal, JC, Hwu, W-MW, Chang, PP & Kiyohara, T 1992, Compiler code transformations for superscalar-based high-performance systems. in R Werner (ed.), Proceedings of the 1992 ACM/IEEE conference on Supercomputing, Supercomputing 1992. Proceedings of the International Conference on Supercomputing, vol. Part F129723, Association for Computing Machinery, pp. 808-817, 1992 ACM/IEEE conference on Supercomputing, Supercomputing 1992, Minneapolis, United States, 11/16/92.
Mahlke SA, Chen WY, Gyuenhaal JC, Hwu W-MW, Chang PP, Kiyohara T. Compiler code transformations for superscalar-based high-performance systems. In Werner R, editor, Proceedings of the 1992 ACM/IEEE conference on Supercomputing, Supercomputing 1992. Association for Computing Machinery. 1992. p. 808-817. (Proceedings of the International Conference on Supercomputing).
Mahlke, Scott A. ; Chen, William Y. ; Gyuenhaal, John C. ; Hwu, Wen-Mei W ; Chang, Pohua P. ; Kiyohara, Tokuzo. / Compiler code transformations for superscalar-based high-performance systems. Proceedings of the 1992 ACM/IEEE conference on Supercomputing, Supercomputing 1992. editor / Robert Werner. Association for Computing Machinery, 1992. pp. 808-817 (Proceedings of the International Conference on Supercomputing).
@inproceedings{5ee9076e25574e5895290faa747606a8,
title = "Compiler code transformations for superscalar-based high-performance systems",
abstract = "Exploiting parallelism at both the multiprocessor level and the instruction level is an effective means for supercomputers to achieve high-performance. The amount of instruction-level parallelism available to superscalar or VLIW node processors can be limited, however, with conventional compiler optimization techniques. In this paper, a set of compiler transformations designed to increase instruction-level parallelism is described. The effectiveness of these transformations is evaluated using 40 loop nests extracted from a range of supercomputer applications. This evaluation shows that increasing execution resources in superscalar/VLIW node processors yields little performance improvement unless loop unrolling and register renaming are applied. It also reveals that these two transformations are sufficient for DOALL loops. However, more advanced transformations are required in order for serial and DOACROSS loops to fully benefit from the increased execution resources. The results show that the six additional transformations studied satisfy most of this need.",
author = "Mahlke, {Scott A.} and Chen, {William Y.} and Gyuenhaal, {John C.} and Hwu, {Wen-Mei W} and Chang, {Pohua P.} and Tokuzo Kiyohara",
year = "1992",
month = "12",
day = "1",
language = "English (US)",
series = "Proceedings of the International Conference on Supercomputing",
publisher = "Association for Computing Machinery",
pages = "808--817",
editor = "Robert Werner",
booktitle = "Proceedings of the 1992 ACM/IEEE conference on Supercomputing, Supercomputing 1992",

}

TY - GEN

T1 - Compiler code transformations for superscalar-based high-performance systems

AU - Mahlke, Scott A.

AU - Chen, William Y.

AU - Gyuenhaal, John C.

AU - Hwu, Wen-Mei W

AU - Chang, Pohua P.

AU - Kiyohara, Tokuzo

PY - 1992/12/1

Y1 - 1992/12/1

N2 - Exploiting parallelism at both the multiprocessor level and the instruction level is an effective means for supercomputers to achieve high-performance. The amount of instruction-level parallelism available to superscalar or VLIW node processors can be limited, however, with conventional compiler optimization techniques. In this paper, a set of compiler transformations designed to increase instruction-level parallelism is described. The effectiveness of these transformations is evaluated using 40 loop nests extracted from a range of supercomputer applications. This evaluation shows that increasing execution resources in superscalar/VLIW node processors yields little performance improvement unless loop unrolling and register renaming are applied. It also reveals that these two transformations are sufficient for DOALL loops. However, more advanced transformations are required in order for serial and DOACROSS loops to fully benefit from the increased execution resources. The results show that the six additional transformations studied satisfy most of this need.

AB - Exploiting parallelism at both the multiprocessor level and the instruction level is an effective means for supercomputers to achieve high-performance. The amount of instruction-level parallelism available to superscalar or VLIW node processors can be limited, however, with conventional compiler optimization techniques. In this paper, a set of compiler transformations designed to increase instruction-level parallelism is described. The effectiveness of these transformations is evaluated using 40 loop nests extracted from a range of supercomputer applications. This evaluation shows that increasing execution resources in superscalar/VLIW node processors yields little performance improvement unless loop unrolling and register renaming are applied. It also reveals that these two transformations are sufficient for DOALL loops. However, more advanced transformations are required in order for serial and DOACROSS loops to fully benefit from the increased execution resources. The results show that the six additional transformations studied satisfy most of this need.

UR - http://www.scopus.com/inward/record.url?scp=47349091741&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=47349091741&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:47349091741

T3 - Proceedings of the International Conference on Supercomputing

SP - 808

EP - 817

BT - Proceedings of the 1992 ACM/IEEE conference on Supercomputing, Supercomputing 1992

A2 - Werner, Robert

PB - Association for Computing Machinery

ER -