TY - GEN
T1 - Data forwarding in scalable shared-memory multiprocessors
AU - Koufaty, D. A.
AU - Chen, X.
AU - Poulsen, D. K.
AU - Torrellas, J.
N1 - Funding Information:
1This work was supported in part by the National Science Foundation under grants NSF Young Investigator Award .MIP 94-57436 and MIP 93-08098; by NASA Contract No. NAG-1 -613; and by a grant from Intel
Funding Information:
We thank the referees for their feedback. David Koufaty is supported by Universidad Sim6n Bolfvar and CONICIT, both of Venezuela. Josep Torrellas is partly supported by an NSF Young Investigator Award.
Publisher Copyright:
© 1995 ACM.
PY - 1995/7/3
Y1 - 1995/7/3
N2 - Scalable shared-memory multiprocessors are often slowed down by long-latency memory accesses. One way to cope with this problem is to use data forwarding to overlap memory accesses with computation. With data forwarding, when a processor produces a datum, in addition to updating its cache, it sends a copy of the datum to the caches of the processors that the compiler identified as consumers of it. As a result, when the consumer processors access the datum, they find it in their caches. This paper addresses two main issues. First, it presents a framework for a compiler algorithm for forwarding. Second, using address traces, it evaluates the performance impact of different levels of support for forwarding. Our simulations of a 32-processor machine show that, on average, a slightly-optimistic support for forwarding speeds up five applications by 50% for large caches and 30% for small caches. For large caches, most read sharing misses can be eliminated, while for small caches, forwarding rarely increases the number of conflict misses. Overall, support for forwarding in shared-memory multiprocessors promises to deliver good application speedups.
AB - Scalable shared-memory multiprocessors are often slowed down by long-latency memory accesses. One way to cope with this problem is to use data forwarding to overlap memory accesses with computation. With data forwarding, when a processor produces a datum, in addition to updating its cache, it sends a copy of the datum to the caches of the processors that the compiler identified as consumers of it. As a result, when the consumer processors access the datum, they find it in their caches. This paper addresses two main issues. First, it presents a framework for a compiler algorithm for forwarding. Second, using address traces, it evaluates the performance impact of different levels of support for forwarding. Our simulations of a 32-processor machine show that, on average, a slightly-optimistic support for forwarding speeds up five applications by 50% for large caches and 30% for small caches. For large caches, most read sharing misses can be eliminated, while for small caches, forwarding rarely increases the number of conflict misses. Overall, support for forwarding in shared-memory multiprocessors promises to deliver good application speedups.
UR - http://www.scopus.com/inward/record.url?scp=0029180738&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0029180738&partnerID=8YFLogxK
U2 - 10.1145/224538.224569
DO - 10.1145/224538.224569
M3 - Conference contribution
AN - SCOPUS:0029180738
T3 - Proceedings of the International Conference on Supercomputing
SP - 255
EP - 264
BT - Proceedings of the 9th International Conference on Supercomputing, ICS 1995
PB - Association for Computing Machinery
T2 - 9th International Conference on Supercomputing, ICS 1995
Y2 - 3 July 1995 through 7 July 1995
ER -