TY - GEN
T1 - Hybrid preemptive scheduling of MPI applications on the grids
AU - Bouteiller, Aurélien
AU - Bouziane, Hinde Lilia
AU - Herault, Thomas
AU - Lemarinier, Pierre
AU - Cappello, Franck
PY - 2004
Y1 - 2004
N2 - Time sharing between cluster resources in Grid is a major issue in cluster and Grid integration. Classical Grid architecture involves a higher level scheduler which submits non overlapping jobs to the independent batch schedulers of each cluster of the Grid. The sequentiality induced by this approach does not fit with the expected number of users and job heterogeneity of the Grids. Time sharing techniques address this issue by allowing simultaneous executions of many applications on the same resources. Co-scheduling and gang scheduling are the two best known techniques for time sharing cluster resources. Co-scheduling relies on the operating system of each node to schedule the processes of every application. Gang scheduling ensures that the same application is scheduled on all nodes simultaneously. Previous work has proven that co-scheduling techniques outperforms gang scheduling when physical memory is not exhausted. In this paper, we introduce a new hybrid sharing technique providing checkpoint based explicit memory management. It consists in co-scheduling parallel applications within a set, until the memory capacity of the node is reached, and using gang scheduling related techniques to switch from one set to another one. We compare experimentally the merits of the three solutions: Co, Gang and Hybrid Scheduling, in the context of out-of-core computing, which is likely to occur in the Grid context, where many users share the same resources. The experiments show that the hybrid solution is as efficient as the co-scheduling technique when the physical memory is not exhausted, and is more efficient than gang scheduling and co-scheduling when physical memory is exhausted.
AB - Time sharing between cluster resources in Grid is a major issue in cluster and Grid integration. Classical Grid architecture involves a higher level scheduler which submits non overlapping jobs to the independent batch schedulers of each cluster of the Grid. The sequentiality induced by this approach does not fit with the expected number of users and job heterogeneity of the Grids. Time sharing techniques address this issue by allowing simultaneous executions of many applications on the same resources. Co-scheduling and gang scheduling are the two best known techniques for time sharing cluster resources. Co-scheduling relies on the operating system of each node to schedule the processes of every application. Gang scheduling ensures that the same application is scheduled on all nodes simultaneously. Previous work has proven that co-scheduling techniques outperforms gang scheduling when physical memory is not exhausted. In this paper, we introduce a new hybrid sharing technique providing checkpoint based explicit memory management. It consists in co-scheduling parallel applications within a set, until the memory capacity of the node is reached, and using gang scheduling related techniques to switch from one set to another one. We compare experimentally the merits of the three solutions: Co, Gang and Hybrid Scheduling, in the context of out-of-core computing, which is likely to occur in the Grid context, where many users share the same resources. The experiments show that the hybrid solution is as efficient as the co-scheduling technique when the physical memory is not exhausted, and is more efficient than gang scheduling and co-scheduling when physical memory is exhausted.
UR - http://www.scopus.com/inward/record.url?scp=19944363534&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=19944363534&partnerID=8YFLogxK
U2 - 10.1109/GRID.2004.39
DO - 10.1109/GRID.2004.39
M3 - Conference contribution
AN - SCOPUS:19944363534
SN - 0769522564
T3 - Proceedings - IEEE/ACM International Workshop on Grid Computing
SP - 130
EP - 137
BT - Proceedings - Fifth IEEE/ACM International Workshop on Grid Computing
PB - IEEE Computer Society
T2 - Proceedings - Fifth IEEE/ACM International Workshop on Grid Computing
Y2 - 8 November 2004 through 8 November 2004
ER -