Architectural support for parallel reductions in scalable shared-memory multiprocessors

María Jesús Garzarán, Milos Prvulovic, Ye Zhang, Alin Jula, Hao Yu, Lawrence Rauchwerger, Josep Torrellas

Research output: Contribution to journalArticlepeer-review


Reductions are important and time-consuming operations in many scientific codes. Effective parallelization of reductions is a critical transformation for loop parallelization, especially for sparse, dynamic applications. Unfortunately, conventional reduction parallelization algorithms are not scalable. In this paper, we present new architectural support that significantly speeds-up parallel reduction and makes it scalable in shared-memory multiprocessors. The required architectural changes are mostly confined to the directory controllers. Experimental results based on simulations show that the proposed support is very effective. While conventional software-only reduction parallelization delivers average speedups of only 2.7 for 16 processors, our scheme delivers average speedups of 7.6.

Original languageEnglish (US)
Pages (from-to)243-254
Number of pages12
JournalParallel Architectures and Compilation Techniques - Conference Proceedings, PACT
StatePublished - 2001

ASJC Scopus subject areas

  • Software
  • Theoretical Computer Science
  • Hardware and Architecture


Dive into the research topics of 'Architectural support for parallel reductions in scalable shared-memory multiprocessors'. Together they form a unique fingerprint.

Cite this