TY - JOUR
T1 - On the automatic parallelization of the Perfect Benchmarks®
AU - Eigenmann, Rudolf
AU - Hoeflinger, Jay
AU - Padua, David
N1 - Funding Information:
This work was supported by U.S. Army contract #DABT63-95-C-0097 and the U.S. Department of Energy under grant #DOE DE-FG02-85ER25001. This work is not necessarily representative of the positions or policies of the U.S. Army or the U.S. Government. Some of the experiments described in this paper were done by Greg Jaxon and Zhiyuan Li while they were members of our research group at CSRD. Their contributions were essential for the success of this project.
PY - 1998
Y1 - 1998
N2 - This paper presents the results of the Cedar Hand-Parallelization Experiment, conducted from 1989 through 1992, within the Center for Supercomputing Research and Development (CSRD) at the University of Illinois. In this experiment, we manually transformed the Perfect Benchmarks® into parallel program versions. In doing so, we used techniques that may be automated in an optimizing compiler. We then ran these programs on the Cedar multiprocessor (built at CSRD during the 1980s) and measured the speed improvement due to each technique. The results presented here extend the findings previously reported in [11]. The techniques credited most for the performance gains include array privatization, parallelization of reduction operations, and the substitution of generalized induction variables. All these techniques can be considered extensions of transformations that were available in vectorizers and commercial restructuring compilers of the late 1980s. We applied these transformations by hand to the given programs, in a mechanical manner, similar to that of a parallelizing compiler. Because of our success with these transformations, we believed that it would be possible to implement many of these techniques in a new parallelizing compiler. Such a compiler has been completed in the meantime and we show preliminary results.
AB - This paper presents the results of the Cedar Hand-Parallelization Experiment, conducted from 1989 through 1992, within the Center for Supercomputing Research and Development (CSRD) at the University of Illinois. In this experiment, we manually transformed the Perfect Benchmarks® into parallel program versions. In doing so, we used techniques that may be automated in an optimizing compiler. We then ran these programs on the Cedar multiprocessor (built at CSRD during the 1980s) and measured the speed improvement due to each technique. The results presented here extend the findings previously reported in [11]. The techniques credited most for the performance gains include array privatization, parallelization of reduction operations, and the substitution of generalized induction variables. All these techniques can be considered extensions of transformations that were available in vectorizers and commercial restructuring compilers of the late 1980s. We applied these transformations by hand to the given programs, in a mechanical manner, similar to that of a parallelizing compiler. Because of our success with these transformations, we believed that it would be possible to implement many of these techniques in a new parallelizing compiler. Such a compiler has been completed in the meantime and we show preliminary results.
KW - Parallelization techniques
KW - Performance evaluation
KW - Program parallelization
KW - Restructuring compilers
UR - http://www.scopus.com/inward/record.url?scp=0031699606&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0031699606&partnerID=8YFLogxK
U2 - 10.1109/71.655238
DO - 10.1109/71.655238
M3 - Article
AN - SCOPUS:0031699606
SN - 1045-9219
VL - 9
SP - 5
EP - 23
JO - IEEE Transactions on Parallel and Distributed Systems
JF - IEEE Transactions on Parallel and Distributed Systems
IS - 1
ER -