TY - GEN
T1 - Cloud-native Workflow Scheduling using a Hybrid Priority Rule, Dynamic Resource Allocation, and Dynamic Task Partition
AU - Shin, Jungeun
AU - Arroyo, Diana
AU - Tantawi, Asser
AU - Wang, Chen
AU - Youssef, Alaa
AU - Nagi, Rakesh
N1 - Publisher Copyright:
© 2024 Owner/Author.
PY - 2024/11/20
Y1 - 2024/11/20
N2 - As cloud-native workflow orchestration tools become increasingly important for complex data science workloads, there is a growing need for more efficient scheduling. Existing cloud schedulers rely on basic heuristics and user choice for task partitioning for parallel computing, leading to under-utilization of cluster resources and prolonged job completion times. To address this, we propose a novel workflow scheduling algorithm that leverages workflow characteristics to enhance resource utilization and reduce weighted job completion time. The algorithm combines three sub-algorithms, each reflecting a distinct aspect of the scheduling strategy: 1) Hybrid Maximum Children (MC) -Weighted Shortest Critical Path Time (WSCPT) rule alternates between two heuristics, MC and WSCPT, which prioritize jobs based on workflow structure and critical path, respectively. The choice between these heuristics is dynamically adjusted according to the cluster queue size. 2) Dynamic Resource Allocation (DRA), which dynamically adjusts the number of executors assigned to each workflow, and 3) Dynamic Task Partition (DTP), which autonomously determines the task parallelism level. We tested our algorithm with extensive experiments on various workflow types using Spark-imitated simulation. Our algorithm outperformed other schedulers, including learning-based models, by reducing 21-47% of the combined performance of average job completion time and makespan for unweighted workflows and reducing at least 50% of weighted job completion time for weighted workflows.
AB - As cloud-native workflow orchestration tools become increasingly important for complex data science workloads, there is a growing need for more efficient scheduling. Existing cloud schedulers rely on basic heuristics and user choice for task partitioning for parallel computing, leading to under-utilization of cluster resources and prolonged job completion times. To address this, we propose a novel workflow scheduling algorithm that leverages workflow characteristics to enhance resource utilization and reduce weighted job completion time. The algorithm combines three sub-algorithms, each reflecting a distinct aspect of the scheduling strategy: 1) Hybrid Maximum Children (MC) -Weighted Shortest Critical Path Time (WSCPT) rule alternates between two heuristics, MC and WSCPT, which prioritize jobs based on workflow structure and critical path, respectively. The choice between these heuristics is dynamically adjusted according to the cluster queue size. 2) Dynamic Resource Allocation (DRA), which dynamically adjusts the number of executors assigned to each workflow, and 3) Dynamic Task Partition (DTP), which autonomously determines the task parallelism level. We tested our algorithm with extensive experiments on various workflow types using Spark-imitated simulation. Our algorithm outperformed other schedulers, including learning-based models, by reducing 21-47% of the combined performance of average job completion time and makespan for unweighted workflows and reducing at least 50% of weighted job completion time for weighted workflows.
KW - Cloud native computing
KW - Dynamic resource allocation
KW - job scheduling
KW - task partitioning
KW - workflow scheduling
UR - http://www.scopus.com/inward/record.url?scp=85215508465&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85215508465&partnerID=8YFLogxK
U2 - 10.1145/3698038.3698551
DO - 10.1145/3698038.3698551
M3 - Conference contribution
AN - SCOPUS:85215508465
T3 - SoCC 2024 - Proceedings of the 2024 ACM Symposium on Cloud Computing
SP - 830
EP - 846
BT - SoCC 2024 - Proceedings of the 2024 ACM Symposium on Cloud Computing
PB - Association for Computing Machinery
T2 - 15th Annual ACM Symposium on Cloud Computing, SoCC 2024
Y2 - 20 November 2024 through 22 November 2024
ER -