Simulation study of simultaneous vector prefetch performance in multiprocessor memory subsystems

Wen mei W. Hwu, Thomas M. Conte

Research output: Contribution to journalConference articlepeer-review


The Cedar multiprocessor is composed of clusters of K computational elements (CEs) (currently, K = 8), where each cluster is a modified Alliant FX/8 mini-supercomputer. The global memory subsystem is composed of two unidirectional, N × N Omega networks and N memory units (MUs) for the N processors in the system. The worst-case scenario was determined to be when all processors simultaneously make a request for the same L-length vector. Vector prefetch performance by the compiler has been estimated using the relation that maximum vector prefetch latency equals maximum inverse bandwidth times vector length. The memory units were identified as the bottleneck for the worst-case operation of the global memory subsystem. Increasing buffering in the switching elements had little effect, nor increasing buffering in the memory units result in increased performance.

Original languageEnglish (US)
Pages (from-to)227
Number of pages1
JournalPerformance Evaluation Review
Issue number1
StatePublished - May 1989
EventACM Sigmetrics and Performance '89 International Conference on Measurement and Modeling of Computer Systems - Proceedings - Berkeley, CA, USA
Duration: May 23 1989May 26 1989

ASJC Scopus subject areas

  • Software
  • Hardware and Architecture
  • Computer Networks and Communications


Dive into the research topics of 'Simulation study of simultaneous vector prefetch performance in multiprocessor memory subsystems'. Together they form a unique fingerprint.

Cite this