Hardware for speculative run-time parallelization in distributed shared-memory multiprocessors

Research output: Contribution to conferencePaperpeer-review

Abstract

Run-time parallelization is often the only way to execute the code in parallel when data dependence information is incomplete at compile time. This situation is common in many important applications. Unfortunately, known techniques for run-time parallelization are often computationally expensive or not general enough. To address this problem, we propose new hardware support for efficient run-time parallelization in distributed shared-memory (DSM) multiprocessors. The idea is to execute the code in parallel speculatively and use extensions to the cache coherence protocol hardware to detect any dependence violations. As soon as a dependence is detected, execution stops, the state is restored, and the code is re-executed serially. This scheme, which we apply to loops, allows iterations to execute and complete in potentially any order. This scheme requires hardware extensions to the cache coherence protocol and memory hierarchy of a DSM. It has low overhead. In this paper, we present the algorithms and a hardware design of the scheme. Overall, the scheme delivers average loop speedups of 7.3 for 16 processors and is 50% faster than a related software-only method.

Original languageEnglish (US)
Pages162-173
Number of pages12
StatePublished - 1998
EventProceedings of the 1998 4th International Symposium on High-Performance Computer Architecture, HPCA - Las Vegas, NV, USA
Duration: Jan 31 1998Feb 4 1998

Other

OtherProceedings of the 1998 4th International Symposium on High-Performance Computer Architecture, HPCA
CityLas Vegas, NV, USA
Period1/31/982/4/98

ASJC Scopus subject areas

  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'Hardware for speculative run-time parallelization in distributed shared-memory multiprocessors'. Together they form a unique fingerprint.

Cite this