Abstract

As VLIW/EPIC processors are increasingly used in real-time, signal-processing, and embedded applications, the importance of minimizing code size and reducing power is growing. This paper describes a new architectural mechanism, called the Modulo Schedule Buffers, that provides an elegant interface for the execution of modulo scheduled loops. While the performance is similar to that of kernel-only modulo scheduling, this mechanism has a number of advantages, including minimal code expansion. Rather than generating fully-scheduled kernels, the compiler generates a sequential form of the modulo scheduled loop body. Using the sequential form, the hardware internally synthesizes the prologue, kernel, and epilogue. In addition, while loops can be scheduled with fewer constraints and fewer explicit prologues/epilogues than with existing mechanisms. Because the hardware controls loop execution, the burden of modulo schedule loop Contr. is lifted from the predicate register file, allowing for a less rigorous predication implementation. Finally hardware Contr. limits the interrupt latency when using the EQ explicit latency model to the execution latency of one iteration, rather than the whole loop invocation.

Original languageEnglish (US)
Pages (from-to)138-149
Number of pages12
JournalProceedings of the Annual International Symposium on Microarchitecture
StatePublished - 2001

Fingerprint

Hardware
Signal processing
Scheduling

ASJC Scopus subject areas

  • Hardware and Architecture
  • Software

Cite this

Modulo Schedule Buffers. / Merten, Matthew C.; Hwu, Wen-Mei W.

In: Proceedings of the Annual International Symposium on Microarchitecture, 2001, p. 138-149.

Research output: Contribution to journalArticle

@article{4823c5bdc1244b3780701612d7819b1a,
title = "Modulo Schedule Buffers",
abstract = "As VLIW/EPIC processors are increasingly used in real-time, signal-processing, and embedded applications, the importance of minimizing code size and reducing power is growing. This paper describes a new architectural mechanism, called the Modulo Schedule Buffers, that provides an elegant interface for the execution of modulo scheduled loops. While the performance is similar to that of kernel-only modulo scheduling, this mechanism has a number of advantages, including minimal code expansion. Rather than generating fully-scheduled kernels, the compiler generates a sequential form of the modulo scheduled loop body. Using the sequential form, the hardware internally synthesizes the prologue, kernel, and epilogue. In addition, while loops can be scheduled with fewer constraints and fewer explicit prologues/epilogues than with existing mechanisms. Because the hardware controls loop execution, the burden of modulo schedule loop Contr. is lifted from the predicate register file, allowing for a less rigorous predication implementation. Finally hardware Contr. limits the interrupt latency when using the EQ explicit latency model to the execution latency of one iteration, rather than the whole loop invocation.",
author = "Merten, {Matthew C.} and Hwu, {Wen-Mei W}",
year = "2001",
language = "English (US)",
pages = "138--149",
journal = "Proceedings of the Annual International Symposium on Microarchitecture, MICRO",
issn = "1072-4451",

}

TY - JOUR

T1 - Modulo Schedule Buffers

AU - Merten, Matthew C.

AU - Hwu, Wen-Mei W

PY - 2001

Y1 - 2001

N2 - As VLIW/EPIC processors are increasingly used in real-time, signal-processing, and embedded applications, the importance of minimizing code size and reducing power is growing. This paper describes a new architectural mechanism, called the Modulo Schedule Buffers, that provides an elegant interface for the execution of modulo scheduled loops. While the performance is similar to that of kernel-only modulo scheduling, this mechanism has a number of advantages, including minimal code expansion. Rather than generating fully-scheduled kernels, the compiler generates a sequential form of the modulo scheduled loop body. Using the sequential form, the hardware internally synthesizes the prologue, kernel, and epilogue. In addition, while loops can be scheduled with fewer constraints and fewer explicit prologues/epilogues than with existing mechanisms. Because the hardware controls loop execution, the burden of modulo schedule loop Contr. is lifted from the predicate register file, allowing for a less rigorous predication implementation. Finally hardware Contr. limits the interrupt latency when using the EQ explicit latency model to the execution latency of one iteration, rather than the whole loop invocation.

AB - As VLIW/EPIC processors are increasingly used in real-time, signal-processing, and embedded applications, the importance of minimizing code size and reducing power is growing. This paper describes a new architectural mechanism, called the Modulo Schedule Buffers, that provides an elegant interface for the execution of modulo scheduled loops. While the performance is similar to that of kernel-only modulo scheduling, this mechanism has a number of advantages, including minimal code expansion. Rather than generating fully-scheduled kernels, the compiler generates a sequential form of the modulo scheduled loop body. Using the sequential form, the hardware internally synthesizes the prologue, kernel, and epilogue. In addition, while loops can be scheduled with fewer constraints and fewer explicit prologues/epilogues than with existing mechanisms. Because the hardware controls loop execution, the burden of modulo schedule loop Contr. is lifted from the predicate register file, allowing for a less rigorous predication implementation. Finally hardware Contr. limits the interrupt latency when using the EQ explicit latency model to the execution latency of one iteration, rather than the whole loop invocation.

UR - http://www.scopus.com/inward/record.url?scp=0035691302&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0035691302&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:0035691302

SP - 138

EP - 149

JO - Proceedings of the Annual International Symposium on Microarchitecture, MICRO

JF - Proceedings of the Annual International Symposium on Microarchitecture, MICRO

SN - 1072-4451

ER -