Abstract
This study presents a characterization of (1) the global memory and interconnection network contention overhead, (2) the operating system overheads, and (3) the runtime system parallelization overheads for the Cedar shared-memory multiprocessor. The measurements were obtained using five representative compute-intensive, scientific, loop parallel applications from the Perfect Benchmark Suite. The overheads were measured for a range of Cedar configurations from 1 processor to the full 4-cluster/32-processor configuration, thus characterizing the effect of this scaling on the overheads. For the full 4-cluster Cedar, the operating system overhead was found to constitute 5-21% of the total completion time of an application. The parallelization overhead accounts for 10-25% of the application completion time, and the overhead due to global memory and network contention contributes 8-21% of the application completion time.
Original language | English (US) |
---|---|
Pages (from-to) | 71-80 |
Number of pages | 10 |
Journal | Conference Proceedings - Annual International Symposium on Computer Architecture, ISCA |
State | Published - 1994 |
Event | Proceedings of the 21st Annual International Symposium on Computer Architecture - Chicago, IL, USA Duration: Apr 18 1994 → Apr 21 1994 |
ASJC Scopus subject areas
- Hardware and Architecture