Abstract
Prefetching is a widely used consumer-initiated mechanism to hide communication latency in shared-memory multiprocessors. However, prefetching is inapplicable or insufficient for some communication patterns such as irregular communication, pipelined loops, and synchronization. For these cases, a combination of two fine-grain, producer-initiated primitives (referred to as remote-writes) is better able to reduce the latency of communication. This paper demonstrates experimentally that remote writes provide significant performance benefits in cache-coherent shared-memory multiprocessors with and without prefetching. Further, the combination of remote writes and prefetching is able to eliminate most of the memory system overhead in the applications except misses due to cache conflicts.
Original language | English (US) |
---|---|
Pages | 204-215 |
Number of pages | 12 |
State | Published - 1997 |
Externally published | Yes |
Event | Proceedings of the 1997 3rd International Symposium on High-Performance Computer Architecture, HPCA - San Antonio, TX, USA Duration: Feb 1 1997 → Feb 5 1997 |
Other
Other | Proceedings of the 1997 3rd International Symposium on High-Performance Computer Architecture, HPCA |
---|---|
City | San Antonio, TX, USA |
Period | 2/1/97 → 2/5/97 |
ASJC Scopus subject areas
- Hardware and Architecture