Performance implications of synchronization support for parallel fortran programs

Sadun Anik, Wen-Mei W Hwu

Research output: Contribution to journalArticle

Abstract

This paper studies the performance implications of architectural synchronization support for automatically parallelized numerical programs. As the basis for this work, we analyze the needs for synchronization in automatically parallelized numerical programs. The needs are due to task scheduling, iteration scheduling, barriers, and data dependence handling. We present synchronization algorithms for efficient execution of programs with nested parallel loops. Next, we identify how various hardware synchronization support can be used to satisfy these software synchronization needs. The synchronization primitives studied are test and set, fetch and add, and exchange-byte operations. In addition to these, synchronization bus implementation of lock/unlock and fetch and add operations are also considered. Finally, we ran experiments to quantify the impact of various architectural support on the performance of a bus-based shared memory multiprocessor running automatically parallelized numerical programs. We found that supporting an atomic fetch and add primitive in shared memory is as effective as supporting lock/unlock operations with a synchronization bus. Both achieve substantial performance improvement over the cases where atomic test and set and exchange-byte operations are supported in shared memory.

Original languageEnglish (US)
Pages (from-to)202-215
Number of pages14
JournalJournal of Parallel and Distributed Computing
Volume22
Issue number2
DOIs
StatePublished - Aug 1994

Fingerprint

Synchronization
Shared Memory
Data storage equipment
Scheduling
Data Dependence
Shared-memory multiprocessors
Data handling
Task Scheduling
Quantify
Hardware
Iteration
Software
Experiment
Experiments

ASJC Scopus subject areas

  • Software
  • Theoretical Computer Science
  • Hardware and Architecture
  • Computer Networks and Communications
  • Artificial Intelligence

Cite this

Performance implications of synchronization support for parallel fortran programs. / Anik, Sadun; Hwu, Wen-Mei W.

In: Journal of Parallel and Distributed Computing, Vol. 22, No. 2, 08.1994, p. 202-215.

Research output: Contribution to journalArticle

@article{7661ab52504a41689c7074cbc329b537,
title = "Performance implications of synchronization support for parallel fortran programs",
abstract = "This paper studies the performance implications of architectural synchronization support for automatically parallelized numerical programs. As the basis for this work, we analyze the needs for synchronization in automatically parallelized numerical programs. The needs are due to task scheduling, iteration scheduling, barriers, and data dependence handling. We present synchronization algorithms for efficient execution of programs with nested parallel loops. Next, we identify how various hardware synchronization support can be used to satisfy these software synchronization needs. The synchronization primitives studied are test and set, fetch and add, and exchange-byte operations. In addition to these, synchronization bus implementation of lock/unlock and fetch and add operations are also considered. Finally, we ran experiments to quantify the impact of various architectural support on the performance of a bus-based shared memory multiprocessor running automatically parallelized numerical programs. We found that supporting an atomic fetch and add primitive in shared memory is as effective as supporting lock/unlock operations with a synchronization bus. Both achieve substantial performance improvement over the cases where atomic test and set and exchange-byte operations are supported in shared memory.",
author = "Sadun Anik and Hwu, {Wen-Mei W}",
year = "1994",
month = "8",
doi = "10.1006/jpdc.1994.1081",
language = "English (US)",
volume = "22",
pages = "202--215",
journal = "Journal of Parallel and Distributed Computing",
issn = "0743-7315",
publisher = "Academic Press Inc.",
number = "2",

}

TY - JOUR

T1 - Performance implications of synchronization support for parallel fortran programs

AU - Anik, Sadun

AU - Hwu, Wen-Mei W

PY - 1994/8

Y1 - 1994/8

N2 - This paper studies the performance implications of architectural synchronization support for automatically parallelized numerical programs. As the basis for this work, we analyze the needs for synchronization in automatically parallelized numerical programs. The needs are due to task scheduling, iteration scheduling, barriers, and data dependence handling. We present synchronization algorithms for efficient execution of programs with nested parallel loops. Next, we identify how various hardware synchronization support can be used to satisfy these software synchronization needs. The synchronization primitives studied are test and set, fetch and add, and exchange-byte operations. In addition to these, synchronization bus implementation of lock/unlock and fetch and add operations are also considered. Finally, we ran experiments to quantify the impact of various architectural support on the performance of a bus-based shared memory multiprocessor running automatically parallelized numerical programs. We found that supporting an atomic fetch and add primitive in shared memory is as effective as supporting lock/unlock operations with a synchronization bus. Both achieve substantial performance improvement over the cases where atomic test and set and exchange-byte operations are supported in shared memory.

AB - This paper studies the performance implications of architectural synchronization support for automatically parallelized numerical programs. As the basis for this work, we analyze the needs for synchronization in automatically parallelized numerical programs. The needs are due to task scheduling, iteration scheduling, barriers, and data dependence handling. We present synchronization algorithms for efficient execution of programs with nested parallel loops. Next, we identify how various hardware synchronization support can be used to satisfy these software synchronization needs. The synchronization primitives studied are test and set, fetch and add, and exchange-byte operations. In addition to these, synchronization bus implementation of lock/unlock and fetch and add operations are also considered. Finally, we ran experiments to quantify the impact of various architectural support on the performance of a bus-based shared memory multiprocessor running automatically parallelized numerical programs. We found that supporting an atomic fetch and add primitive in shared memory is as effective as supporting lock/unlock operations with a synchronization bus. Both achieve substantial performance improvement over the cases where atomic test and set and exchange-byte operations are supported in shared memory.

UR - http://www.scopus.com/inward/record.url?scp=43949161000&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=43949161000&partnerID=8YFLogxK

U2 - 10.1006/jpdc.1994.1081

DO - 10.1006/jpdc.1994.1081

M3 - Article

AN - SCOPUS:43949161000

VL - 22

SP - 202

EP - 215

JO - Journal of Parallel and Distributed Computing

JF - Journal of Parallel and Distributed Computing

SN - 0743-7315

IS - 2

ER -