Abstract
Parallel processor systems that have been built so far can execute in parallel only singly nested parallel loops. However, it is crucial to be able to exploit multidimensional parallelism which occurs in multiply nested parallel loops. Developing schemes for executing efficiently arbitrarily nested loops in parallel will allow us to exploit (and therefore develop) computer systems with hundreds or thousands of processors. In this paper, we discuss issues on program parallelism and processor allocation for parallel processor systems. Optimal processor assignment algorithms are presented for simple and complex nested parallel loops. These processor assignment schemes can be used by the compiler to perform static processor allocation to multiply nested parallel loops. Speedup measurements for EISPACK and IEEE DSP subroutines that result from the optimal assignment of processors to parallel loops are also presented. These measurements indicate that optimal processor assignments result in almost linear speedups on parallel processor machines with a few tens of processors, and significantly high speedups for machines with hundreds or thousands of processors.
Original language | English (US) |
---|---|
Pages (from-to) | 1285-1296 |
Number of pages | 12 |
Journal | IEEE Transactions on Computers |
Volume | 38 |
Issue number | 9 |
DOIs | |
State | Published - Sep 1989 |
Keywords
- Parallel loops
- parallel processor system
- processor allocation
- restructuring compilers
- speedup
ASJC Scopus subject areas
- Software
- Theoretical Computer Science
- Hardware and Architecture
- Computational Theory and Mathematics