Cloud-native Workflow Scheduling using a Hybrid Priority Rule, Dynamic Resource Allocation, and Dynamic Task Partition

Jungeun Shin, Diana Arroyo, Asser Tantawi, Chen Wang, Alaa Youssef, Rakesh Nagi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

As cloud-native workflow orchestration tools become increasingly important for complex data science workloads, there is a growing need for more efficient scheduling. Existing cloud schedulers rely on basic heuristics and user choice for task partitioning for parallel computing, leading to under-utilization of cluster resources and prolonged job completion times. To address this, we propose a novel workflow scheduling algorithm that leverages workflow characteristics to enhance resource utilization and reduce weighted job completion time. The algorithm combines three sub-algorithms, each reflecting a distinct aspect of the scheduling strategy: 1) Hybrid Maximum Children (MC) -Weighted Shortest Critical Path Time (WSCPT) rule alternates between two heuristics, MC and WSCPT, which prioritize jobs based on workflow structure and critical path, respectively. The choice between these heuristics is dynamically adjusted according to the cluster queue size. 2) Dynamic Resource Allocation (DRA), which dynamically adjusts the number of executors assigned to each workflow, and 3) Dynamic Task Partition (DTP), which autonomously determines the task parallelism level. We tested our algorithm with extensive experiments on various workflow types using Spark-imitated simulation. Our algorithm outperformed other schedulers, including learning-based models, by reducing 21-47% of the combined performance of average job completion time and makespan for unweighted workflows and reducing at least 50% of weighted job completion time for weighted workflows.

Original languageEnglish (US)
Title of host publicationSoCC 2024 - Proceedings of the 2024 ACM Symposium on Cloud Computing
PublisherAssociation for Computing Machinery
Pages830-846
Number of pages17
ISBN (Electronic)9798400712869
DOIs
StatePublished - Nov 20 2024
Event15th Annual ACM Symposium on Cloud Computing, SoCC 2024 - Redmond, United States
Duration: Nov 20 2024Nov 22 2024

Publication series

NameSoCC 2024 - Proceedings of the 2024 ACM Symposium on Cloud Computing

Conference

Conference15th Annual ACM Symposium on Cloud Computing, SoCC 2024
Country/TerritoryUnited States
CityRedmond
Period11/20/2411/22/24

Keywords

  • Cloud native computing
  • Dynamic resource allocation
  • job scheduling
  • task partitioning
  • workflow scheduling

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • Computer Networks and Communications
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Cloud-native Workflow Scheduling using a Hybrid Priority Rule, Dynamic Resource Allocation, and Dynamic Task Partition'. Together they form a unique fingerprint.

Cite this