TY - GEN
T1 - Clustered approach to multithreaded processors
AU - Krishnan, Venkata
AU - Torrellas, Josep
PY - 1998
Y1 - 1998
N2 - With aggressive superscalar processors delivering diminishing returns, alternate designs that make good use of the increasing chip densities are actively being explored. One such approach is simultaneous multithreading (SMT), where a conventional superscalar supports multiple threads such that instructions from different threads may be issued in a single cycle. Another approach is the on-chip multiprocessor and its variants. Unlike the SMT approach, all the resources have fixed assignment (FA) in this architecture. The design simplicity of the FA approach enables high clock frequencies, while the flexibility of the SMT approach allows it to adapt to the specific thread- and instruction-level parallelism of the application. Unfortunately, the strict partitioning of resources among various processors in the FA architecture may result in under-utilization of the chip, while the fully centralized structure of the SMT may result in a longer clock cycle-time. In this paper, we explore a hybrid design, where a chip is composed of a set of SMT processors. We evaluate such a clustered architecture running parallel applications. We consider both a low-end machine with only one processor chip on which to run multiple threads as well as a high-end machine with several processor chips working on the same application. Overall, we conclude that such a hybrid processor represents a good performance-complexity design point.
AB - With aggressive superscalar processors delivering diminishing returns, alternate designs that make good use of the increasing chip densities are actively being explored. One such approach is simultaneous multithreading (SMT), where a conventional superscalar supports multiple threads such that instructions from different threads may be issued in a single cycle. Another approach is the on-chip multiprocessor and its variants. Unlike the SMT approach, all the resources have fixed assignment (FA) in this architecture. The design simplicity of the FA approach enables high clock frequencies, while the flexibility of the SMT approach allows it to adapt to the specific thread- and instruction-level parallelism of the application. Unfortunately, the strict partitioning of resources among various processors in the FA architecture may result in under-utilization of the chip, while the fully centralized structure of the SMT may result in a longer clock cycle-time. In this paper, we explore a hybrid design, where a chip is composed of a set of SMT processors. We evaluate such a clustered architecture running parallel applications. We consider both a low-end machine with only one processor chip on which to run multiple threads as well as a high-end machine with several processor chips working on the same application. Overall, we conclude that such a hybrid processor represents a good performance-complexity design point.
UR - http://www.scopus.com/inward/record.url?scp=0031652125&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0031652125&partnerID=8YFLogxK
U2 - 10.1109/IPPS.1998.669992
DO - 10.1109/IPPS.1998.669992
M3 - Conference contribution
AN - SCOPUS:0031652125
SN - 0818684046
T3 - Proceedings of the International Parallel Processing Symposium, IPPS
SP - 627
EP - 634
BT - Proceedings of the International Parallel Processing Symposium, IPPS
A2 - Anon, null
T2 - Proceedings of the 1998 12th International Parallel Processing Symposium and 9th Symposium on Parallel and Distributed Processing
Y2 - 30 March 1998 through 3 April 1998
ER -