Continuous optimization

Brian Fahs, Todd Rafacz, Sanjay Jeram Patel, Steven Sam Lumetta

Research output: Contribution to journalConference article

Abstract

This paper presents a hardware-based dynamic optimizer that continuously optimizes an application's instruction stream. In continuous optimization, dataflow optimizations are performed using simple, table-based hardware placed in the rename stage of the processor pipeline. The continuous optimizer reduces dataflow height by performing constant propagation, reassociation, redundant load elimination, store forwarding, and silent store removal. To enhance the impact of the optimizations, the optimizer integrates values generated by the execution units back into the optimization process. Continuous optimization allows instructions with input values known at optimization time to be executed in the optimizer, leaving less work for the out-of-order portion of the pipeline. Continuous optimization can detect branch mispredictions earlier and thus reduce the misprediction penalty. In this paper, we present a detailed description of a hardware optimizer and evaluate it in the context of a contemporary microarchitecture running current workloads. Our analysis of SPECint, SPECfp, and mediabench workloads reveals that a hardware optimizer can directly execute 33% of instructions, resolve 29% of mispredicted branches, and generate addresses for 76% of memory operations. These positive effects combine to provide speed ups in the range 0.99 to 1.27.

Original languageEnglish (US)
Pages (from-to)86-97
Number of pages12
JournalProceedings - International Symposium on Computer Architecture
StatePublished - Nov 10 2005
Event32nd Interntional Symposium on Computer Architecture, ISCA 2005 - Madison, WI, United States
Duration: Jun 4 2005Jun 8 2005

Fingerprint

Hardware
Pipelines
Data storage equipment

ASJC Scopus subject areas

  • Engineering(all)

Cite this

Continuous optimization. / Fahs, Brian; Rafacz, Todd; Patel, Sanjay Jeram; Lumetta, Steven Sam.

In: Proceedings - International Symposium on Computer Architecture, 10.11.2005, p. 86-97.

Research output: Contribution to journalConference article

@article{b3522222452b442d81971c150ccade0c,
title = "Continuous optimization",
abstract = "This paper presents a hardware-based dynamic optimizer that continuously optimizes an application's instruction stream. In continuous optimization, dataflow optimizations are performed using simple, table-based hardware placed in the rename stage of the processor pipeline. The continuous optimizer reduces dataflow height by performing constant propagation, reassociation, redundant load elimination, store forwarding, and silent store removal. To enhance the impact of the optimizations, the optimizer integrates values generated by the execution units back into the optimization process. Continuous optimization allows instructions with input values known at optimization time to be executed in the optimizer, leaving less work for the out-of-order portion of the pipeline. Continuous optimization can detect branch mispredictions earlier and thus reduce the misprediction penalty. In this paper, we present a detailed description of a hardware optimizer and evaluate it in the context of a contemporary microarchitecture running current workloads. Our analysis of SPECint, SPECfp, and mediabench workloads reveals that a hardware optimizer can directly execute 33{\%} of instructions, resolve 29{\%} of mispredicted branches, and generate addresses for 76{\%} of memory operations. These positive effects combine to provide speed ups in the range 0.99 to 1.27.",
author = "Brian Fahs and Todd Rafacz and Patel, {Sanjay Jeram} and Lumetta, {Steven Sam}",
year = "2005",
month = "11",
day = "10",
language = "English (US)",
pages = "86--97",
journal = "Conference Proceedings - Annual International Symposium on Computer Architecture, ISCA",
issn = "1063-6897",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Continuous optimization

AU - Fahs, Brian

AU - Rafacz, Todd

AU - Patel, Sanjay Jeram

AU - Lumetta, Steven Sam

PY - 2005/11/10

Y1 - 2005/11/10

N2 - This paper presents a hardware-based dynamic optimizer that continuously optimizes an application's instruction stream. In continuous optimization, dataflow optimizations are performed using simple, table-based hardware placed in the rename stage of the processor pipeline. The continuous optimizer reduces dataflow height by performing constant propagation, reassociation, redundant load elimination, store forwarding, and silent store removal. To enhance the impact of the optimizations, the optimizer integrates values generated by the execution units back into the optimization process. Continuous optimization allows instructions with input values known at optimization time to be executed in the optimizer, leaving less work for the out-of-order portion of the pipeline. Continuous optimization can detect branch mispredictions earlier and thus reduce the misprediction penalty. In this paper, we present a detailed description of a hardware optimizer and evaluate it in the context of a contemporary microarchitecture running current workloads. Our analysis of SPECint, SPECfp, and mediabench workloads reveals that a hardware optimizer can directly execute 33% of instructions, resolve 29% of mispredicted branches, and generate addresses for 76% of memory operations. These positive effects combine to provide speed ups in the range 0.99 to 1.27.

AB - This paper presents a hardware-based dynamic optimizer that continuously optimizes an application's instruction stream. In continuous optimization, dataflow optimizations are performed using simple, table-based hardware placed in the rename stage of the processor pipeline. The continuous optimizer reduces dataflow height by performing constant propagation, reassociation, redundant load elimination, store forwarding, and silent store removal. To enhance the impact of the optimizations, the optimizer integrates values generated by the execution units back into the optimization process. Continuous optimization allows instructions with input values known at optimization time to be executed in the optimizer, leaving less work for the out-of-order portion of the pipeline. Continuous optimization can detect branch mispredictions earlier and thus reduce the misprediction penalty. In this paper, we present a detailed description of a hardware optimizer and evaluate it in the context of a contemporary microarchitecture running current workloads. Our analysis of SPECint, SPECfp, and mediabench workloads reveals that a hardware optimizer can directly execute 33% of instructions, resolve 29% of mispredicted branches, and generate addresses for 76% of memory operations. These positive effects combine to provide speed ups in the range 0.99 to 1.27.

UR - http://www.scopus.com/inward/record.url?scp=27544446445&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=27544446445&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:27544446445

SP - 86

EP - 97

JO - Conference Proceedings - Annual International Symposium on Computer Architecture, ISCA

JF - Conference Proceedings - Annual International Symposium on Computer Architecture, ISCA

SN - 1063-6897

ER -