Architectural support for scalable speculative parallelization in shared-memory multiprocessors

Marcelo Cintra, Jose F. Martinez, Josep Torrellas

Research output: Contribution to journalConference article

Abstract

Speculative parallelization aggressively executes in parallel codes that cannot be fully parallelized by the compiler. Past proposals of hardware schemes have mostly focused on single-chip multiprocessors (CMPs), whose effectiveness is necessarily limited by their small size. Very few schemes have attempted this technique in the context of scalable shared-memory systems. In this paper, we present and evaluate a new hardware scheme for scalable speculative parallelization. This design needs relatively simple hardware and is efficiently integrated into a cache-coherent NUMA system. We have designed the scheme in a hierarchical manner that largely abstracts away the internals of the node. We effectively utilize a speculative CMP as the building block for our scheme. Simulations show that the architecture proposed delivers good speedups at a modest hardware cost. For a set of important non-analyzable scientific loops, we report average speedups of 4.2 for 16 processors. We show that support for per-word speculative state is required by our applications, or else the performance suffers greatly.

Original languageEnglish (US)
Pages (from-to)13-24
Number of pages12
JournalConference Proceedings - Annual International Symposium on Computer Architecture, ISCA
StatePublished - Jan 1 2000
EventISCA-27: The 27th Annual International Symposium on Computer Architecture - Vancouver, BC, Can
Duration: Jun 10 2000Jun 14 2000

Fingerprint

Hardware
Data storage equipment
Costs

ASJC Scopus subject areas

  • Hardware and Architecture

Cite this

@article{603a735246bf4a9ab39b7b33a4de34c6,
title = "Architectural support for scalable speculative parallelization in shared-memory multiprocessors",
abstract = "Speculative parallelization aggressively executes in parallel codes that cannot be fully parallelized by the compiler. Past proposals of hardware schemes have mostly focused on single-chip multiprocessors (CMPs), whose effectiveness is necessarily limited by their small size. Very few schemes have attempted this technique in the context of scalable shared-memory systems. In this paper, we present and evaluate a new hardware scheme for scalable speculative parallelization. This design needs relatively simple hardware and is efficiently integrated into a cache-coherent NUMA system. We have designed the scheme in a hierarchical manner that largely abstracts away the internals of the node. We effectively utilize a speculative CMP as the building block for our scheme. Simulations show that the architecture proposed delivers good speedups at a modest hardware cost. For a set of important non-analyzable scientific loops, we report average speedups of 4.2 for 16 processors. We show that support for per-word speculative state is required by our applications, or else the performance suffers greatly.",
author = "Marcelo Cintra and Martinez, {Jose F.} and Josep Torrellas",
year = "2000",
month = "1",
day = "1",
language = "English (US)",
pages = "13--24",
journal = "Conference Proceedings - Annual International Symposium on Computer Architecture, ISCA",
issn = "1063-6897",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Architectural support for scalable speculative parallelization in shared-memory multiprocessors

AU - Cintra, Marcelo

AU - Martinez, Jose F.

AU - Torrellas, Josep

PY - 2000/1/1

Y1 - 2000/1/1

N2 - Speculative parallelization aggressively executes in parallel codes that cannot be fully parallelized by the compiler. Past proposals of hardware schemes have mostly focused on single-chip multiprocessors (CMPs), whose effectiveness is necessarily limited by their small size. Very few schemes have attempted this technique in the context of scalable shared-memory systems. In this paper, we present and evaluate a new hardware scheme for scalable speculative parallelization. This design needs relatively simple hardware and is efficiently integrated into a cache-coherent NUMA system. We have designed the scheme in a hierarchical manner that largely abstracts away the internals of the node. We effectively utilize a speculative CMP as the building block for our scheme. Simulations show that the architecture proposed delivers good speedups at a modest hardware cost. For a set of important non-analyzable scientific loops, we report average speedups of 4.2 for 16 processors. We show that support for per-word speculative state is required by our applications, or else the performance suffers greatly.

AB - Speculative parallelization aggressively executes in parallel codes that cannot be fully parallelized by the compiler. Past proposals of hardware schemes have mostly focused on single-chip multiprocessors (CMPs), whose effectiveness is necessarily limited by their small size. Very few schemes have attempted this technique in the context of scalable shared-memory systems. In this paper, we present and evaluate a new hardware scheme for scalable speculative parallelization. This design needs relatively simple hardware and is efficiently integrated into a cache-coherent NUMA system. We have designed the scheme in a hierarchical manner that largely abstracts away the internals of the node. We effectively utilize a speculative CMP as the building block for our scheme. Simulations show that the architecture proposed delivers good speedups at a modest hardware cost. For a set of important non-analyzable scientific loops, we report average speedups of 4.2 for 16 processors. We show that support for per-word speculative state is required by our applications, or else the performance suffers greatly.

UR - http://www.scopus.com/inward/record.url?scp=0033689702&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0033689702&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:0033689702

SP - 13

EP - 24

JO - Conference Proceedings - Annual International Symposium on Computer Architecture, ISCA

JF - Conference Proceedings - Annual International Symposium on Computer Architecture, ISCA

SN - 1063-6897

ER -