Beating in-order stalls with "flea-flicker" two-pass pipelining

R. D. Barnes, S. J. Patel, E. M. Nystrom, N. Navarro, J. W. Sias, W. W. Hwu

Research output: Chapter in Book/Report/Conference proceedingConference contribution


Accommodating the uncertain latency of load instructions is one of the most vexing problems in in-order microarchitecture design and compiler development. Compilers can generate schedules with a high degree of instruction-level parallelism but cannot effectively accommodate unanticipated latencies; incorporating traditional out-of-order execution into the microarchitecture hides some of this latency but redundantly performs work done by the compiler and adds additional pipeline stages. Although effective techniques, such as prefetching and threading, have been proposed to deal with anticipable, long latency misses, the shorter, more diffuse stalls due to difficult-to-anticipate, first- or second-level misses are less easily hidden on in-order architectures. This paper addresses this problem by proposing a microarchitectural technique, referred to as two-pass pipelining, wherein the program executes on two in-order back-end pipelines coupled by a queue. The "advance" pipeline executes instructions greedily, without stalling on unanticipated latency dependences (executing independent instructions while otherwise blocking instructions are deferred). The "backup" pipeline allows concurrent resolution of instructions that were deferred in the other pipeline, resulting in the absorption of shorter misses and the overlap of longer ones. This paper argues that this design is both achievable and a good use of transistor resources and shows results indicating that it can deliver significant speedups for in-order processor designs.

Original languageEnglish (US)
Title of host publicationProceedings - 36th International Symposium on Microarchitecture, MICRO 2003
PublisherIEEE Computer Society
Number of pages12
ISBN (Electronic)076952043X
StatePublished - 2003
Event36th International Symposium on Microarchitecture, MICRO 2003 - San Diego, United States
Duration: Dec 3 2003Dec 5 2003

Publication series

NameProceedings of the Annual International Symposium on Microarchitecture, MICRO
ISSN (Print)1072-4451


Other36th International Symposium on Microarchitecture, MICRO 2003
Country/TerritoryUnited States
CitySan Diego


  • Computer aided instruction
  • Delay
  • Microarchitecture
  • Out of order
  • Parallel processing
  • Pipeline processing
  • Process design
  • Processor scheduling
  • Registers
  • Runtime

ASJC Scopus subject areas

  • Hardware and Architecture


Dive into the research topics of 'Beating in-order stalls with "flea-flicker" two-pass pipelining'. Together they form a unique fingerprint.

Cite this