TY - JOUR
T1 - The performance of the cedar multistage switching network
AU - Torrellas, Josep
AU - Zhang, Zheng
N1 - Funding Information:
grant from the University of Illinois Research Board.
Funding Information:
Josep Torrellas is supported in part by U.S. National Science Foundation Young Investigator Award MIP 94-57436. Other support was provided by the U.S. National Science Foundation under grants RIA MIP 93-08098, MIP 93-07910, and MIP 89-20891; NASA Contract No. NAG-1-613; and a
PY - 1997
Y1 - 1997
N2 - While multistage switching networks for vector multiprocessors have been studied extensively, detailed evaluations of their performance are rare. Indeed, analytical models, simulations with pseudosynthetic loads, studies focused on average-value parameters, and measurements of networks disconnected from the machine, all provide limited information. In this paper, instead, we present an in-depth empirical analysis of a multistage switching network in a realistic setting: We use hardware probes to examine the performance of the omega network of the Cedar shared-memory machine executing real applications. The machine is configured with 16 vector processors. The analysis suggests that the performance of multistage switching networks is limited by traffic nonuniformities. We identify two major nonuniformities that degrade Cedar's performance and are likely to slow down other networks too. The first one is the contention caused by the return messages in a vector access as they converge from the memories to one processor port. This traffic convergence penalizes vector reads and, more importantly, causes tree saturation. The second nonuniformity is the uneven contention delays induced by a relatively fair scheme to resolve message collisions. Based on our observations, we argue that intuitive optimizations for multistage switching networks may not be the most cost-effective ones. Instead, we suggest changes to increase the network bandwidth at the root of the traffic convergence tree and to delay traffic convergence up until the final stages of the network.
AB - While multistage switching networks for vector multiprocessors have been studied extensively, detailed evaluations of their performance are rare. Indeed, analytical models, simulations with pseudosynthetic loads, studies focused on average-value parameters, and measurements of networks disconnected from the machine, all provide limited information. In this paper, instead, we present an in-depth empirical analysis of a multistage switching network in a realistic setting: We use hardware probes to examine the performance of the omega network of the Cedar shared-memory machine executing real applications. The machine is configured with 16 vector processors. The analysis suggests that the performance of multistage switching networks is limited by traffic nonuniformities. We identify two major nonuniformities that degrade Cedar's performance and are likely to slow down other networks too. The first one is the contention caused by the return messages in a vector access as they converge from the memories to one processor port. This traffic convergence penalizes vector reads and, more importantly, causes tree saturation. The second nonuniformity is the uneven contention delays induced by a relatively fair scheme to resolve message collisions. Based on our observations, we argue that intuitive optimizations for multistage switching networks may not be the most cost-effective ones. Instead, we suggest changes to increase the network bandwidth at the root of the traffic convergence tree and to delay traffic convergence up until the final stages of the network.
KW - Address tracing
KW - Experimental analysis
KW - Multistage switching networks
KW - Performance evaluation
KW - Vector multiprocessors
UR - http://www.scopus.com/inward/record.url?scp=0031117384&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0031117384&partnerID=8YFLogxK
U2 - 10.1109/71.588598
DO - 10.1109/71.588598
M3 - Article
AN - SCOPUS:0031117384
VL - 8
SP - 321
EP - 336
JO - IEEE Transactions on Parallel and Distributed Systems
JF - IEEE Transactions on Parallel and Distributed Systems
SN - 1045-9219
IS - 4
ER -