Efficient algorithm for the run-time parallelization of DOACROSS loops

Ding Kai Chen, Josep Torrellas, Pen Chung Yew

Research output: Contribution to journalConference article

Abstract

While automatic parallelization of loops usually relies on compile-time analysis of data dependences, for some loops the data dependences cannot be determined at compile time. An example is loops accessing arrays with subscripted subscripts. To parallelize these loops, it is necessary to perform run-time analysis. In this paper, we present a new algorithm to parallelize these loops at run time. Our scheme handles any type of data dependence in the loop without requiring any special architectural support in the multiprocessor. Furthermore, compared to an older scheme with the same generality, our scheme significantly reduces the amount of processor communication required and increases the overlap among dependent iterations. We evaluate our algorithm with parameterized loops running on the 32-processor Cedar shared-memory multiprocessor. The results show speedups over the serial code up to 14 with the full overhead of run-time analysis and of up to 27 if part of the analysis is reused across loop invocations. Moreover, the algorithm outperforms the older scheme in nearly all cases, reaching speedups of up to times when the loop has many dependences.

Original languageEnglish (US)
Pages (from-to)518-527
Number of pages10
JournalProceedings of the ACM/IEEE Supercomputing Conference
StatePublished - Dec 1 1994
EventProceedings of the 1994 Supercomputing Conference - Washington, DC, USA
Duration: Nov 14 1994Nov 18 1994

Fingerprint

Data storage equipment
Communication

ASJC Scopus subject areas

  • Electrical and Electronic Engineering

Cite this

Efficient algorithm for the run-time parallelization of DOACROSS loops. / Chen, Ding Kai; Torrellas, Josep; Yew, Pen Chung.

In: Proceedings of the ACM/IEEE Supercomputing Conference, 01.12.1994, p. 518-527.

Research output: Contribution to journalConference article

@article{b25abe42a6694eadb6f65bf9328f11b3,
title = "Efficient algorithm for the run-time parallelization of DOACROSS loops",
abstract = "While automatic parallelization of loops usually relies on compile-time analysis of data dependences, for some loops the data dependences cannot be determined at compile time. An example is loops accessing arrays with subscripted subscripts. To parallelize these loops, it is necessary to perform run-time analysis. In this paper, we present a new algorithm to parallelize these loops at run time. Our scheme handles any type of data dependence in the loop without requiring any special architectural support in the multiprocessor. Furthermore, compared to an older scheme with the same generality, our scheme significantly reduces the amount of processor communication required and increases the overlap among dependent iterations. We evaluate our algorithm with parameterized loops running on the 32-processor Cedar shared-memory multiprocessor. The results show speedups over the serial code up to 14 with the full overhead of run-time analysis and of up to 27 if part of the analysis is reused across loop invocations. Moreover, the algorithm outperforms the older scheme in nearly all cases, reaching speedups of up to times when the loop has many dependences.",
author = "Chen, {Ding Kai} and Josep Torrellas and Yew, {Pen Chung}",
year = "1994",
month = "12",
day = "1",
language = "English (US)",
pages = "518--527",
journal = "Proceedings of the ACM/IEEE Supercomputing Conference",
issn = "1063-9535",

}

TY - JOUR

T1 - Efficient algorithm for the run-time parallelization of DOACROSS loops

AU - Chen, Ding Kai

AU - Torrellas, Josep

AU - Yew, Pen Chung

PY - 1994/12/1

Y1 - 1994/12/1

N2 - While automatic parallelization of loops usually relies on compile-time analysis of data dependences, for some loops the data dependences cannot be determined at compile time. An example is loops accessing arrays with subscripted subscripts. To parallelize these loops, it is necessary to perform run-time analysis. In this paper, we present a new algorithm to parallelize these loops at run time. Our scheme handles any type of data dependence in the loop without requiring any special architectural support in the multiprocessor. Furthermore, compared to an older scheme with the same generality, our scheme significantly reduces the amount of processor communication required and increases the overlap among dependent iterations. We evaluate our algorithm with parameterized loops running on the 32-processor Cedar shared-memory multiprocessor. The results show speedups over the serial code up to 14 with the full overhead of run-time analysis and of up to 27 if part of the analysis is reused across loop invocations. Moreover, the algorithm outperforms the older scheme in nearly all cases, reaching speedups of up to times when the loop has many dependences.

AB - While automatic parallelization of loops usually relies on compile-time analysis of data dependences, for some loops the data dependences cannot be determined at compile time. An example is loops accessing arrays with subscripted subscripts. To parallelize these loops, it is necessary to perform run-time analysis. In this paper, we present a new algorithm to parallelize these loops at run time. Our scheme handles any type of data dependence in the loop without requiring any special architectural support in the multiprocessor. Furthermore, compared to an older scheme with the same generality, our scheme significantly reduces the amount of processor communication required and increases the overlap among dependent iterations. We evaluate our algorithm with parameterized loops running on the 32-processor Cedar shared-memory multiprocessor. The results show speedups over the serial code up to 14 with the full overhead of run-time analysis and of up to 27 if part of the analysis is reused across loop invocations. Moreover, the algorithm outperforms the older scheme in nearly all cases, reaching speedups of up to times when the loop has many dependences.

UR - http://www.scopus.com/inward/record.url?scp=0028744946&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0028744946&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:0028744946

SP - 518

EP - 527

JO - Proceedings of the ACM/IEEE Supercomputing Conference

JF - Proceedings of the ACM/IEEE Supercomputing Conference

SN - 1063-9535

ER -