"Flea-flicker" Multipass pipelining: An alternative to the high-power out-of-order offense

Ronald D. Barnes, Shane Ryoo, Wen-Mei W Hwu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

As microprocessor designs become increasingly power-and complexity-conscious, future microarchitectures must decrease their reliance on expensive dynamic scheduling structures. While compilers have generally proven adept at planning useful static instruction-level parallelism, relying solely on the compiler's instruction execution arrangement performs poorly when cache misses occur, because variable latency is not well tolerated. This paper proposes a new microarchitectural model, multipass pipelining, that exploits meticulous compile-time scheduling on simple in-order hardware while achieving excellent cache miss tolerance through persistent advance preexecution beyond otherwise stalled instructions. The pipeline systematically makes multiple passes through instructions that follow a stalled instruction. Each pass increases the speed and energy efficiency of the subsequent ones by preserving computed results. The concept of multiple passes and successive improvement of efficiency across passes in a single pipeline distinguishes multipass pipelining from other runahead schemes. Simulation results show that the multipass technique achieves 77% of the cycle reduction of aggressive out-of-order execution relative to in-order execution. In addition, microarchitectural-level power simulation indicates that benefits of multipass are achieved at a fraction of the power overhead of full dynamic scheduling.

Original languageEnglish (US)
Title of host publicationMICRO-38
Subtitle of host publicationProceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture
Pages319-330
Number of pages12
DOIs
StatePublished - Dec 1 2005
EventMICRO-38: 38th Annual IEEE/ACM International Symposium on Microarchitecture - Barcelona, Spain
Duration: Nov 12 2005Nov 16 2005

Publication series

NameProceedings of the Annual International Symposium on Microarchitecture, MICRO
ISSN (Print)1072-4451

Other

OtherMICRO-38: 38th Annual IEEE/ACM International Symposium on Microarchitecture
CountrySpain
CityBarcelona
Period11/12/0511/16/05

Fingerprint

Scheduling
Pipelines
Energy efficiency
Microprocessor chips
Hardware
Planning

ASJC Scopus subject areas

  • Engineering(all)

Cite this

Barnes, R. D., Ryoo, S., & Hwu, W-M. W. (2005). "Flea-flicker" Multipass pipelining: An alternative to the high-power out-of-order offense. In MICRO-38: Proceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture (pp. 319-330). [1540970] (Proceedings of the Annual International Symposium on Microarchitecture, MICRO). https://doi.org/10.1109/MICRO.2005.1

"Flea-flicker" Multipass pipelining : An alternative to the high-power out-of-order offense. / Barnes, Ronald D.; Ryoo, Shane; Hwu, Wen-Mei W.

MICRO-38: Proceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture. 2005. p. 319-330 1540970 (Proceedings of the Annual International Symposium on Microarchitecture, MICRO).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Barnes, RD, Ryoo, S & Hwu, W-MW 2005, "Flea-flicker" Multipass pipelining: An alternative to the high-power out-of-order offense. in MICRO-38: Proceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture., 1540970, Proceedings of the Annual International Symposium on Microarchitecture, MICRO, pp. 319-330, MICRO-38: 38th Annual IEEE/ACM International Symposium on Microarchitecture, Barcelona, Spain, 11/12/05. https://doi.org/10.1109/MICRO.2005.1
Barnes RD, Ryoo S, Hwu W-MW. "Flea-flicker" Multipass pipelining: An alternative to the high-power out-of-order offense. In MICRO-38: Proceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture. 2005. p. 319-330. 1540970. (Proceedings of the Annual International Symposium on Microarchitecture, MICRO). https://doi.org/10.1109/MICRO.2005.1
Barnes, Ronald D. ; Ryoo, Shane ; Hwu, Wen-Mei W. / "Flea-flicker" Multipass pipelining : An alternative to the high-power out-of-order offense. MICRO-38: Proceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture. 2005. pp. 319-330 (Proceedings of the Annual International Symposium on Microarchitecture, MICRO).
@inproceedings{aefd39ba0e3449f9b2c917358952235b,
title = "{"}Flea-flicker{"} Multipass pipelining: An alternative to the high-power out-of-order offense",
abstract = "As microprocessor designs become increasingly power-and complexity-conscious, future microarchitectures must decrease their reliance on expensive dynamic scheduling structures. While compilers have generally proven adept at planning useful static instruction-level parallelism, relying solely on the compiler's instruction execution arrangement performs poorly when cache misses occur, because variable latency is not well tolerated. This paper proposes a new microarchitectural model, multipass pipelining, that exploits meticulous compile-time scheduling on simple in-order hardware while achieving excellent cache miss tolerance through persistent advance preexecution beyond otherwise stalled instructions. The pipeline systematically makes multiple passes through instructions that follow a stalled instruction. Each pass increases the speed and energy efficiency of the subsequent ones by preserving computed results. The concept of multiple passes and successive improvement of efficiency across passes in a single pipeline distinguishes multipass pipelining from other runahead schemes. Simulation results show that the multipass technique achieves 77{\%} of the cycle reduction of aggressive out-of-order execution relative to in-order execution. In addition, microarchitectural-level power simulation indicates that benefits of multipass are achieved at a fraction of the power overhead of full dynamic scheduling.",
author = "Barnes, {Ronald D.} and Shane Ryoo and Hwu, {Wen-Mei W}",
year = "2005",
month = "12",
day = "1",
doi = "10.1109/MICRO.2005.1",
language = "English (US)",
isbn = "0769524400",
series = "Proceedings of the Annual International Symposium on Microarchitecture, MICRO",
pages = "319--330",
booktitle = "MICRO-38",

}

TY - GEN

T1 - "Flea-flicker" Multipass pipelining

T2 - An alternative to the high-power out-of-order offense

AU - Barnes, Ronald D.

AU - Ryoo, Shane

AU - Hwu, Wen-Mei W

PY - 2005/12/1

Y1 - 2005/12/1

N2 - As microprocessor designs become increasingly power-and complexity-conscious, future microarchitectures must decrease their reliance on expensive dynamic scheduling structures. While compilers have generally proven adept at planning useful static instruction-level parallelism, relying solely on the compiler's instruction execution arrangement performs poorly when cache misses occur, because variable latency is not well tolerated. This paper proposes a new microarchitectural model, multipass pipelining, that exploits meticulous compile-time scheduling on simple in-order hardware while achieving excellent cache miss tolerance through persistent advance preexecution beyond otherwise stalled instructions. The pipeline systematically makes multiple passes through instructions that follow a stalled instruction. Each pass increases the speed and energy efficiency of the subsequent ones by preserving computed results. The concept of multiple passes and successive improvement of efficiency across passes in a single pipeline distinguishes multipass pipelining from other runahead schemes. Simulation results show that the multipass technique achieves 77% of the cycle reduction of aggressive out-of-order execution relative to in-order execution. In addition, microarchitectural-level power simulation indicates that benefits of multipass are achieved at a fraction of the power overhead of full dynamic scheduling.

AB - As microprocessor designs become increasingly power-and complexity-conscious, future microarchitectures must decrease their reliance on expensive dynamic scheduling structures. While compilers have generally proven adept at planning useful static instruction-level parallelism, relying solely on the compiler's instruction execution arrangement performs poorly when cache misses occur, because variable latency is not well tolerated. This paper proposes a new microarchitectural model, multipass pipelining, that exploits meticulous compile-time scheduling on simple in-order hardware while achieving excellent cache miss tolerance through persistent advance preexecution beyond otherwise stalled instructions. The pipeline systematically makes multiple passes through instructions that follow a stalled instruction. Each pass increases the speed and energy efficiency of the subsequent ones by preserving computed results. The concept of multiple passes and successive improvement of efficiency across passes in a single pipeline distinguishes multipass pipelining from other runahead schemes. Simulation results show that the multipass technique achieves 77% of the cycle reduction of aggressive out-of-order execution relative to in-order execution. In addition, microarchitectural-level power simulation indicates that benefits of multipass are achieved at a fraction of the power overhead of full dynamic scheduling.

UR - http://www.scopus.com/inward/record.url?scp=33644900150&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33644900150&partnerID=8YFLogxK

U2 - 10.1109/MICRO.2005.1

DO - 10.1109/MICRO.2005.1

M3 - Conference contribution

AN - SCOPUS:33644900150

SN - 0769524400

SN - 9780769524405

T3 - Proceedings of the Annual International Symposium on Microarchitecture, MICRO

SP - 319

EP - 330

BT - MICRO-38

ER -