The Packing Server for real-time scheduling of MapReduce workflows

Shen Li, Shaohan Hu, Tarek Abdelzaher

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper develops new schedulability bounds for a simplified MapReduce workflow model. MapReduce is a distributed computing paradigm, deployed in industry for over a decade. Different from conventional multiprocessor platforms, MapReduce deployments usually span thousands of machines, and a MapReduce job may contain as many as tens of thousands of parallel segments. State-of-the-art MapReduce workflow schedulers operate in a best-effort fashion, but the need for real-time operation has grown with the emergence of real-time analytic applications. MapReduce workflow details can be captured by the generalized parallel task model from recent real-time literature. Under this model, the best-known result guarantees schedulability if the task set utilization stays below 50% of total capacity, and the deadline to critical path length ratio, which we call the stretch φ, surpasses 2. This paper improves this bound further by introducing a hierarchical scheduling scheme based on the novel notion of a Packing Server, inspired by servers for aperiodic tasks. The Packing Server consists of multiple periodically replenished budgets that can execute in parallel and that appear as independent tasks to the underlying scheduler. Hence, the original problem of scheduling MapReduce workflows reduces to that of scheduling independent tasks. We prove that the utilization bound for schedulability of MapReduce workflows is dependent on the underlying independent task scheduling policy, and β is a tunable parameter that controls the maximum individual budget utilization.

Original languageEnglish (US)
Title of host publicationProceedings - 21st IEEE Real Time and Embedded Technology and Applications Symposium, RTAS 2015
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages51-62
Number of pages12
ISBN (Electronic)9781479986033
DOIs
StatePublished - May 14 2015
Event21st IEEE Real Time and Embedded Technology and Applications Symposium, RTAS 2015 - Seattle, United States
Duration: Apr 13 2015Apr 16 2015

Publication series

NameProceedings of the IEEE Real-Time and Embedded Technology and Applications Symposium, RTAS
Volume2015-May
ISSN (Print)1545-3421

Other

Other21st IEEE Real Time and Embedded Technology and Applications Symposium, RTAS 2015
CountryUnited States
CitySeattle
Period4/13/154/16/15

Fingerprint

Servers
Scheduling
Distributed computer systems
Industry

ASJC Scopus subject areas

  • Engineering(all)

Cite this

Li, S., Hu, S., & Abdelzaher, T. (2015). The Packing Server for real-time scheduling of MapReduce workflows. In Proceedings - 21st IEEE Real Time and Embedded Technology and Applications Symposium, RTAS 2015 (pp. 51-62). [7108416] (Proceedings of the IEEE Real-Time and Embedded Technology and Applications Symposium, RTAS; Vol. 2015-May). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/RTAS.2015.7108416

The Packing Server for real-time scheduling of MapReduce workflows. / Li, Shen; Hu, Shaohan; Abdelzaher, Tarek.

Proceedings - 21st IEEE Real Time and Embedded Technology and Applications Symposium, RTAS 2015. Institute of Electrical and Electronics Engineers Inc., 2015. p. 51-62 7108416 (Proceedings of the IEEE Real-Time and Embedded Technology and Applications Symposium, RTAS; Vol. 2015-May).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Li, S, Hu, S & Abdelzaher, T 2015, The Packing Server for real-time scheduling of MapReduce workflows. in Proceedings - 21st IEEE Real Time and Embedded Technology and Applications Symposium, RTAS 2015., 7108416, Proceedings of the IEEE Real-Time and Embedded Technology and Applications Symposium, RTAS, vol. 2015-May, Institute of Electrical and Electronics Engineers Inc., pp. 51-62, 21st IEEE Real Time and Embedded Technology and Applications Symposium, RTAS 2015, Seattle, United States, 4/13/15. https://doi.org/10.1109/RTAS.2015.7108416
Li S, Hu S, Abdelzaher T. The Packing Server for real-time scheduling of MapReduce workflows. In Proceedings - 21st IEEE Real Time and Embedded Technology and Applications Symposium, RTAS 2015. Institute of Electrical and Electronics Engineers Inc. 2015. p. 51-62. 7108416. (Proceedings of the IEEE Real-Time and Embedded Technology and Applications Symposium, RTAS). https://doi.org/10.1109/RTAS.2015.7108416
Li, Shen ; Hu, Shaohan ; Abdelzaher, Tarek. / The Packing Server for real-time scheduling of MapReduce workflows. Proceedings - 21st IEEE Real Time and Embedded Technology and Applications Symposium, RTAS 2015. Institute of Electrical and Electronics Engineers Inc., 2015. pp. 51-62 (Proceedings of the IEEE Real-Time and Embedded Technology and Applications Symposium, RTAS).
@inproceedings{90e6156b23c14c00b67a948e6b56ee61,
title = "The Packing Server for real-time scheduling of MapReduce workflows",
abstract = "This paper develops new schedulability bounds for a simplified MapReduce workflow model. MapReduce is a distributed computing paradigm, deployed in industry for over a decade. Different from conventional multiprocessor platforms, MapReduce deployments usually span thousands of machines, and a MapReduce job may contain as many as tens of thousands of parallel segments. State-of-the-art MapReduce workflow schedulers operate in a best-effort fashion, but the need for real-time operation has grown with the emergence of real-time analytic applications. MapReduce workflow details can be captured by the generalized parallel task model from recent real-time literature. Under this model, the best-known result guarantees schedulability if the task set utilization stays below 50{\%} of total capacity, and the deadline to critical path length ratio, which we call the stretch φ, surpasses 2. This paper improves this bound further by introducing a hierarchical scheduling scheme based on the novel notion of a Packing Server, inspired by servers for aperiodic tasks. The Packing Server consists of multiple periodically replenished budgets that can execute in parallel and that appear as independent tasks to the underlying scheduler. Hence, the original problem of scheduling MapReduce workflows reduces to that of scheduling independent tasks. We prove that the utilization bound for schedulability of MapReduce workflows is dependent on the underlying independent task scheduling policy, and β is a tunable parameter that controls the maximum individual budget utilization.",
author = "Shen Li and Shaohan Hu and Tarek Abdelzaher",
year = "2015",
month = "5",
day = "14",
doi = "10.1109/RTAS.2015.7108416",
language = "English (US)",
series = "Proceedings of the IEEE Real-Time and Embedded Technology and Applications Symposium, RTAS",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "51--62",
booktitle = "Proceedings - 21st IEEE Real Time and Embedded Technology and Applications Symposium, RTAS 2015",
address = "United States",

}

TY - GEN

T1 - The Packing Server for real-time scheduling of MapReduce workflows

AU - Li, Shen

AU - Hu, Shaohan

AU - Abdelzaher, Tarek

PY - 2015/5/14

Y1 - 2015/5/14

N2 - This paper develops new schedulability bounds for a simplified MapReduce workflow model. MapReduce is a distributed computing paradigm, deployed in industry for over a decade. Different from conventional multiprocessor platforms, MapReduce deployments usually span thousands of machines, and a MapReduce job may contain as many as tens of thousands of parallel segments. State-of-the-art MapReduce workflow schedulers operate in a best-effort fashion, but the need for real-time operation has grown with the emergence of real-time analytic applications. MapReduce workflow details can be captured by the generalized parallel task model from recent real-time literature. Under this model, the best-known result guarantees schedulability if the task set utilization stays below 50% of total capacity, and the deadline to critical path length ratio, which we call the stretch φ, surpasses 2. This paper improves this bound further by introducing a hierarchical scheduling scheme based on the novel notion of a Packing Server, inspired by servers for aperiodic tasks. The Packing Server consists of multiple periodically replenished budgets that can execute in parallel and that appear as independent tasks to the underlying scheduler. Hence, the original problem of scheduling MapReduce workflows reduces to that of scheduling independent tasks. We prove that the utilization bound for schedulability of MapReduce workflows is dependent on the underlying independent task scheduling policy, and β is a tunable parameter that controls the maximum individual budget utilization.

AB - This paper develops new schedulability bounds for a simplified MapReduce workflow model. MapReduce is a distributed computing paradigm, deployed in industry for over a decade. Different from conventional multiprocessor platforms, MapReduce deployments usually span thousands of machines, and a MapReduce job may contain as many as tens of thousands of parallel segments. State-of-the-art MapReduce workflow schedulers operate in a best-effort fashion, but the need for real-time operation has grown with the emergence of real-time analytic applications. MapReduce workflow details can be captured by the generalized parallel task model from recent real-time literature. Under this model, the best-known result guarantees schedulability if the task set utilization stays below 50% of total capacity, and the deadline to critical path length ratio, which we call the stretch φ, surpasses 2. This paper improves this bound further by introducing a hierarchical scheduling scheme based on the novel notion of a Packing Server, inspired by servers for aperiodic tasks. The Packing Server consists of multiple periodically replenished budgets that can execute in parallel and that appear as independent tasks to the underlying scheduler. Hence, the original problem of scheduling MapReduce workflows reduces to that of scheduling independent tasks. We prove that the utilization bound for schedulability of MapReduce workflows is dependent on the underlying independent task scheduling policy, and β is a tunable parameter that controls the maximum individual budget utilization.

UR - http://www.scopus.com/inward/record.url?scp=84944683703&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84944683703&partnerID=8YFLogxK

U2 - 10.1109/RTAS.2015.7108416

DO - 10.1109/RTAS.2015.7108416

M3 - Conference contribution

AN - SCOPUS:84944683703

T3 - Proceedings of the IEEE Real-Time and Embedded Technology and Applications Symposium, RTAS

SP - 51

EP - 62

BT - Proceedings - 21st IEEE Real Time and Embedded Technology and Applications Symposium, RTAS 2015

PB - Institute of Electrical and Electronics Engineers Inc.

ER -