WOHA: Deadline-aware map-reduce workflow scheduling framework over hadoop clusters

Shen Li, Shaohan Hu, Shiguang Wang, Lu Su, Tarek Abdelzaher, Indranil Gupta, Richard Pace

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper, we present WOHA, an efficient scheduling framework for deadline-aware Map-Reduce workflows. In data centers, complex backend data analysis often utilizes a workflow that contains tens or even hundreds of interdependent Map-Reduce jobs. Meeting deadlines of these workflows is usually of crucial importance to businesses (for example, workflows tightly linked to time-sensitive advertisement placement optimizations can directly affect revenue). Popular Map-Reduce implementations, such as Hadoop, deal with independent Map-Reduce jobs rather than workflows of jobs. In order to simplify the process of submitting workflows, solutions like Oozie emerge, which take a workflow configuration file as input and automatically submit its Hadoop jobs at the right time. The information separation that Hadoop only handles resource allocation and Oozie workflow topology, although preventing the Hadoop master node from getting involved with complex workflow analysis, may unnecessarily lengthen the workflow spans and thus cause more deadline misses. To address this problem and at the same time honor the efficiency of Hadoop master node, WOHA allows client nodes to locally generate scheduling plans which are later used as resource allocation hints by the master node. Under this framework design, we propose a novel scheduling algorithm that improves deadline satisfaction ratio by dynamically assigning priorities among workflows based on their progresses. We implement WOHA by extending Hadoop-1.2.1. Our experiments over an 80-server cluster show that WOHA manages to increase the deadline satisfaction ratio by 10% compared to state-of-the-art solutions, and scales up to tens of thousands of concurrently running workflows.

Original languageEnglish (US)
Title of host publicationProceedings - International Conference on Distributed Computing Systems
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages93-103
Number of pages11
ISBN (Electronic)9781479951680
DOIs
StatePublished - Aug 29 2014
Event2014 IEEE 34th International Conference on Distributed Computing Systems, ICDCS 2014 - Madrid, Spain
Duration: Jun 30 2014Jul 3 2014

Publication series

NameProceedings - International Conference on Distributed Computing Systems

Other

Other2014 IEEE 34th International Conference on Distributed Computing Systems, ICDCS 2014
CountrySpain
CityMadrid
Period6/30/147/3/14

Fingerprint

Scheduling
Resource allocation
Scheduling algorithms
Servers
Topology
Industry
Experiments

Keywords

  • Deadline
  • Hadoop
  • MapReduce
  • Scheduling
  • Workflow

ASJC Scopus subject areas

  • Software
  • Hardware and Architecture
  • Computer Networks and Communications

Cite this

Li, S., Hu, S., Wang, S., Su, L., Abdelzaher, T., Gupta, I., & Pace, R. (2014). WOHA: Deadline-aware map-reduce workflow scheduling framework over hadoop clusters. In Proceedings - International Conference on Distributed Computing Systems (pp. 93-103). [6888886] (Proceedings - International Conference on Distributed Computing Systems). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICDCS.2014.18

WOHA : Deadline-aware map-reduce workflow scheduling framework over hadoop clusters. / Li, Shen; Hu, Shaohan; Wang, Shiguang; Su, Lu; Abdelzaher, Tarek; Gupta, Indranil; Pace, Richard.

Proceedings - International Conference on Distributed Computing Systems. Institute of Electrical and Electronics Engineers Inc., 2014. p. 93-103 6888886 (Proceedings - International Conference on Distributed Computing Systems).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Li, S, Hu, S, Wang, S, Su, L, Abdelzaher, T, Gupta, I & Pace, R 2014, WOHA: Deadline-aware map-reduce workflow scheduling framework over hadoop clusters. in Proceedings - International Conference on Distributed Computing Systems., 6888886, Proceedings - International Conference on Distributed Computing Systems, Institute of Electrical and Electronics Engineers Inc., pp. 93-103, 2014 IEEE 34th International Conference on Distributed Computing Systems, ICDCS 2014, Madrid, Spain, 6/30/14. https://doi.org/10.1109/ICDCS.2014.18
Li S, Hu S, Wang S, Su L, Abdelzaher T, Gupta I et al. WOHA: Deadline-aware map-reduce workflow scheduling framework over hadoop clusters. In Proceedings - International Conference on Distributed Computing Systems. Institute of Electrical and Electronics Engineers Inc. 2014. p. 93-103. 6888886. (Proceedings - International Conference on Distributed Computing Systems). https://doi.org/10.1109/ICDCS.2014.18
Li, Shen ; Hu, Shaohan ; Wang, Shiguang ; Su, Lu ; Abdelzaher, Tarek ; Gupta, Indranil ; Pace, Richard. / WOHA : Deadline-aware map-reduce workflow scheduling framework over hadoop clusters. Proceedings - International Conference on Distributed Computing Systems. Institute of Electrical and Electronics Engineers Inc., 2014. pp. 93-103 (Proceedings - International Conference on Distributed Computing Systems).
@inproceedings{a30b130758694618a849762cc0d96107,
title = "WOHA: Deadline-aware map-reduce workflow scheduling framework over hadoop clusters",
abstract = "In this paper, we present WOHA, an efficient scheduling framework for deadline-aware Map-Reduce workflows. In data centers, complex backend data analysis often utilizes a workflow that contains tens or even hundreds of interdependent Map-Reduce jobs. Meeting deadlines of these workflows is usually of crucial importance to businesses (for example, workflows tightly linked to time-sensitive advertisement placement optimizations can directly affect revenue). Popular Map-Reduce implementations, such as Hadoop, deal with independent Map-Reduce jobs rather than workflows of jobs. In order to simplify the process of submitting workflows, solutions like Oozie emerge, which take a workflow configuration file as input and automatically submit its Hadoop jobs at the right time. The information separation that Hadoop only handles resource allocation and Oozie workflow topology, although preventing the Hadoop master node from getting involved with complex workflow analysis, may unnecessarily lengthen the workflow spans and thus cause more deadline misses. To address this problem and at the same time honor the efficiency of Hadoop master node, WOHA allows client nodes to locally generate scheduling plans which are later used as resource allocation hints by the master node. Under this framework design, we propose a novel scheduling algorithm that improves deadline satisfaction ratio by dynamically assigning priorities among workflows based on their progresses. We implement WOHA by extending Hadoop-1.2.1. Our experiments over an 80-server cluster show that WOHA manages to increase the deadline satisfaction ratio by 10{\%} compared to state-of-the-art solutions, and scales up to tens of thousands of concurrently running workflows.",
keywords = "Deadline, Hadoop, MapReduce, Scheduling, Workflow",
author = "Shen Li and Shaohan Hu and Shiguang Wang and Lu Su and Tarek Abdelzaher and Indranil Gupta and Richard Pace",
year = "2014",
month = "8",
day = "29",
doi = "10.1109/ICDCS.2014.18",
language = "English (US)",
series = "Proceedings - International Conference on Distributed Computing Systems",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "93--103",
booktitle = "Proceedings - International Conference on Distributed Computing Systems",
address = "United States",

}

TY - GEN

T1 - WOHA

T2 - Deadline-aware map-reduce workflow scheduling framework over hadoop clusters

AU - Li, Shen

AU - Hu, Shaohan

AU - Wang, Shiguang

AU - Su, Lu

AU - Abdelzaher, Tarek

AU - Gupta, Indranil

AU - Pace, Richard

PY - 2014/8/29

Y1 - 2014/8/29

N2 - In this paper, we present WOHA, an efficient scheduling framework for deadline-aware Map-Reduce workflows. In data centers, complex backend data analysis often utilizes a workflow that contains tens or even hundreds of interdependent Map-Reduce jobs. Meeting deadlines of these workflows is usually of crucial importance to businesses (for example, workflows tightly linked to time-sensitive advertisement placement optimizations can directly affect revenue). Popular Map-Reduce implementations, such as Hadoop, deal with independent Map-Reduce jobs rather than workflows of jobs. In order to simplify the process of submitting workflows, solutions like Oozie emerge, which take a workflow configuration file as input and automatically submit its Hadoop jobs at the right time. The information separation that Hadoop only handles resource allocation and Oozie workflow topology, although preventing the Hadoop master node from getting involved with complex workflow analysis, may unnecessarily lengthen the workflow spans and thus cause more deadline misses. To address this problem and at the same time honor the efficiency of Hadoop master node, WOHA allows client nodes to locally generate scheduling plans which are later used as resource allocation hints by the master node. Under this framework design, we propose a novel scheduling algorithm that improves deadline satisfaction ratio by dynamically assigning priorities among workflows based on their progresses. We implement WOHA by extending Hadoop-1.2.1. Our experiments over an 80-server cluster show that WOHA manages to increase the deadline satisfaction ratio by 10% compared to state-of-the-art solutions, and scales up to tens of thousands of concurrently running workflows.

AB - In this paper, we present WOHA, an efficient scheduling framework for deadline-aware Map-Reduce workflows. In data centers, complex backend data analysis often utilizes a workflow that contains tens or even hundreds of interdependent Map-Reduce jobs. Meeting deadlines of these workflows is usually of crucial importance to businesses (for example, workflows tightly linked to time-sensitive advertisement placement optimizations can directly affect revenue). Popular Map-Reduce implementations, such as Hadoop, deal with independent Map-Reduce jobs rather than workflows of jobs. In order to simplify the process of submitting workflows, solutions like Oozie emerge, which take a workflow configuration file as input and automatically submit its Hadoop jobs at the right time. The information separation that Hadoop only handles resource allocation and Oozie workflow topology, although preventing the Hadoop master node from getting involved with complex workflow analysis, may unnecessarily lengthen the workflow spans and thus cause more deadline misses. To address this problem and at the same time honor the efficiency of Hadoop master node, WOHA allows client nodes to locally generate scheduling plans which are later used as resource allocation hints by the master node. Under this framework design, we propose a novel scheduling algorithm that improves deadline satisfaction ratio by dynamically assigning priorities among workflows based on their progresses. We implement WOHA by extending Hadoop-1.2.1. Our experiments over an 80-server cluster show that WOHA manages to increase the deadline satisfaction ratio by 10% compared to state-of-the-art solutions, and scales up to tens of thousands of concurrently running workflows.

KW - Deadline

KW - Hadoop

KW - MapReduce

KW - Scheduling

KW - Workflow

UR - http://www.scopus.com/inward/record.url?scp=84907783038&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84907783038&partnerID=8YFLogxK

U2 - 10.1109/ICDCS.2014.18

DO - 10.1109/ICDCS.2014.18

M3 - Conference contribution

AN - SCOPUS:84907783038

T3 - Proceedings - International Conference on Distributed Computing Systems

SP - 93

EP - 103

BT - Proceedings - International Conference on Distributed Computing Systems

PB - Institute of Electrical and Electronics Engineers Inc.

ER -