Abstract
The Cedar multiprocessor is composed of clusters of K computational elements (CEs) (currently, K = 8), where each cluster is a modified Alliant FX/8 mini-supercomputer. The global memory subsystem is composed of two unidirectional, N × N Omega networks and N memory units (MUs) for the N processors in the system. The worst-case scenario was determined to be when all processors simultaneously make a request for the same L-length vector. Vector prefetch performance by the compiler has been estimated using the relation that maximum vector prefetch latency equals maximum inverse bandwidth times vector length. The memory units were identified as the bottleneck for the worst-case operation of the global memory subsystem. Increasing buffering in the switching elements had little effect, nor increasing buffering in the memory units result in increased performance.
Original language | English (US) |
---|---|
Pages (from-to) | 227 |
Number of pages | 1 |
Journal | Performance Evaluation Review |
Volume | 17 |
Issue number | 1 |
State | Published - May 1989 |
Event | ACM Sigmetrics and Performance '89 International Conference on Measurement and Modeling of Computer Systems - Proceedings - Berkeley, CA, USA Duration: May 23 1989 → May 26 1989 |
ASJC Scopus subject areas
- Software
- Hardware and Architecture
- Computer Networks and Communications