TY - GEN
T1 - Supporting Efficient Execution in Heterogeneous Distributed Computing Environments with Cactus and Globus
AU - Allen, Gabrielle
AU - Dramlitsch, Thomas
AU - Foster, Ian
AU - Karonis, Nicholas T.
AU - Ripeanu, Matei
AU - Seidel, Edward
AU - Toonen, Brian
N1 - Publisher Copyright:
© 2001 ACM.
PY - 2001/11/10
Y1 - 2001/11/10
N2 - Improvements in the performance of processors and networks make it both feasible and interesting to treat collections of workstations, servers, clusters, and supercomputers as integrated computational resources, or Grids. However, the highly heterogeneous and dynamic nature of such Grids can make application development difficult. Here we describe an architecture and prototype implementation for a Grid-enabled computational framework based on Cactus, the MPICH-G2 Grid-enabled message-passing library, and a variety of specialized features to support efficient execution in Grid environments. We have used this framework to perform record-setting computations in numerical relativity, running across four supercomputers and achieving scaling of 88% (1140 CPU's) and 63% (1500 CPUs). The problem size we were able to compute was about five times larger than any other previous run. Further, we introduce and demonstrate adaptive methods that automatically adjust computational parameters during run time, to increase dramatically the efficiency of a distributed Grid simulation, without modification of the application and without any knowledge of the underlying network connecting the distributed computers.
AB - Improvements in the performance of processors and networks make it both feasible and interesting to treat collections of workstations, servers, clusters, and supercomputers as integrated computational resources, or Grids. However, the highly heterogeneous and dynamic nature of such Grids can make application development difficult. Here we describe an architecture and prototype implementation for a Grid-enabled computational framework based on Cactus, the MPICH-G2 Grid-enabled message-passing library, and a variety of specialized features to support efficient execution in Grid environments. We have used this framework to perform record-setting computations in numerical relativity, running across four supercomputers and achieving scaling of 88% (1140 CPU's) and 63% (1500 CPUs). The problem size we were able to compute was about five times larger than any other previous run. Further, we introduce and demonstrate adaptive methods that automatically adjust computational parameters during run time, to increase dramatically the efficiency of a distributed Grid simulation, without modification of the application and without any knowledge of the underlying network connecting the distributed computers.
UR - http://www.scopus.com/inward/record.url?scp=30344463648&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=30344463648&partnerID=8YFLogxK
U2 - 10.1145/582034.582086
DO - 10.1145/582034.582086
M3 - Conference contribution
AN - SCOPUS:30344463648
T3 - Proceedings of the International Conference on Supercomputing
SP - 52
BT - Proceedings of the 2001 ACM/IEEE Conference on Supercomputing, SC 2001
PB - Association for Computing Machinery
T2 - 2001 ACM/IEEE Conference on Supercomputing, SC 2001
Y2 - 10 November 2001 through 16 November 2001
ER -