Abstract
Modulo scheduling is a method for overlapping successive iterations of a loop in order to find sufficient instruction-level parallelism to fully utilize high-issue-rate processors. The achieved throughput modulo scheduled loop depends on the resource requirements, the dependence pattern, and the register requirements of the computation in the loop body. Traditionally, unrolling followed by acyclic scheduling of the unrolled body has been an alternative to modulo scheduling. However, there are benefits to unrolling even if the loop is to be modulo scheduled. Unrolling can improve the throughput by allowing a smaller non-integral effective initiation interval to be achieved. After unrolling, optimizations can be applied to the loop that reduce both the resource requirements and the height of the critical paths. Together, unrolling and unrolling-based optimizations can enable the completion of multiple iterations per cycle in some cases. This paper describes the benefits of unrolling and a set of optimizations for unrolled loops which have been implemented in the IMPACT compiler. The performance benefits of unrolling for five of the SPEC CFP92 programs are reported.
Original language | English (US) |
---|---|
Pages (from-to) | 327-337 |
Number of pages | 11 |
Journal | Proceedings of the Annual International Symposium on Microarchitecture |
State | Published - Dec 1 1995 |
Event | Proceedings of the 1995 28th Annual International Symposium on Microarchitecture - Ann Arbor, MI, USA Duration: Nov 29 1995 → Dec 1 1995 |
Fingerprint
ASJC Scopus subject areas
- Hardware and Architecture
- Software
Cite this
Unrolling-based optimizations for modulo scheduling. / Lavery, Daniel M.; Hwu, Wen mei W.
In: Proceedings of the Annual International Symposium on Microarchitecture, 01.12.1995, p. 327-337.Research output: Contribution to journal › Conference article
}
TY - JOUR
T1 - Unrolling-based optimizations for modulo scheduling
AU - Lavery, Daniel M.
AU - Hwu, Wen mei W.
PY - 1995/12/1
Y1 - 1995/12/1
N2 - Modulo scheduling is a method for overlapping successive iterations of a loop in order to find sufficient instruction-level parallelism to fully utilize high-issue-rate processors. The achieved throughput modulo scheduled loop depends on the resource requirements, the dependence pattern, and the register requirements of the computation in the loop body. Traditionally, unrolling followed by acyclic scheduling of the unrolled body has been an alternative to modulo scheduling. However, there are benefits to unrolling even if the loop is to be modulo scheduled. Unrolling can improve the throughput by allowing a smaller non-integral effective initiation interval to be achieved. After unrolling, optimizations can be applied to the loop that reduce both the resource requirements and the height of the critical paths. Together, unrolling and unrolling-based optimizations can enable the completion of multiple iterations per cycle in some cases. This paper describes the benefits of unrolling and a set of optimizations for unrolled loops which have been implemented in the IMPACT compiler. The performance benefits of unrolling for five of the SPEC CFP92 programs are reported.
AB - Modulo scheduling is a method for overlapping successive iterations of a loop in order to find sufficient instruction-level parallelism to fully utilize high-issue-rate processors. The achieved throughput modulo scheduled loop depends on the resource requirements, the dependence pattern, and the register requirements of the computation in the loop body. Traditionally, unrolling followed by acyclic scheduling of the unrolled body has been an alternative to modulo scheduling. However, there are benefits to unrolling even if the loop is to be modulo scheduled. Unrolling can improve the throughput by allowing a smaller non-integral effective initiation interval to be achieved. After unrolling, optimizations can be applied to the loop that reduce both the resource requirements and the height of the critical paths. Together, unrolling and unrolling-based optimizations can enable the completion of multiple iterations per cycle in some cases. This paper describes the benefits of unrolling and a set of optimizations for unrolled loops which have been implemented in the IMPACT compiler. The performance benefits of unrolling for five of the SPEC CFP92 programs are reported.
UR - http://www.scopus.com/inward/record.url?scp=0029487787&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0029487787&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:0029487787
SP - 327
EP - 337
JO - Proceedings of the Annual International Symposium on Microarchitecture, MICRO
JF - Proceedings of the Annual International Symposium on Microarchitecture, MICRO
SN - 1072-4451
ER -