TY - GEN
T1 - The Packing Server for real-time scheduling of MapReduce workflows
AU - Li, Shen
AU - Hu, Shaohan
AU - Abdelzaher, Tarek
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2015/5/14
Y1 - 2015/5/14
N2 - This paper develops new schedulability bounds for a simplified MapReduce workflow model. MapReduce is a distributed computing paradigm, deployed in industry for over a decade. Different from conventional multiprocessor platforms, MapReduce deployments usually span thousands of machines, and a MapReduce job may contain as many as tens of thousands of parallel segments. State-of-the-art MapReduce workflow schedulers operate in a best-effort fashion, but the need for real-time operation has grown with the emergence of real-time analytic applications. MapReduce workflow details can be captured by the generalized parallel task model from recent real-time literature. Under this model, the best-known result guarantees schedulability if the task set utilization stays below 50% of total capacity, and the deadline to critical path length ratio, which we call the stretch φ, surpasses 2. This paper improves this bound further by introducing a hierarchical scheduling scheme based on the novel notion of a Packing Server, inspired by servers for aperiodic tasks. The Packing Server consists of multiple periodically replenished budgets that can execute in parallel and that appear as independent tasks to the underlying scheduler. Hence, the original problem of scheduling MapReduce workflows reduces to that of scheduling independent tasks. We prove that the utilization bound for schedulability of MapReduce workflows is dependent on the underlying independent task scheduling policy, and β is a tunable parameter that controls the maximum individual budget utilization.
AB - This paper develops new schedulability bounds for a simplified MapReduce workflow model. MapReduce is a distributed computing paradigm, deployed in industry for over a decade. Different from conventional multiprocessor platforms, MapReduce deployments usually span thousands of machines, and a MapReduce job may contain as many as tens of thousands of parallel segments. State-of-the-art MapReduce workflow schedulers operate in a best-effort fashion, but the need for real-time operation has grown with the emergence of real-time analytic applications. MapReduce workflow details can be captured by the generalized parallel task model from recent real-time literature. Under this model, the best-known result guarantees schedulability if the task set utilization stays below 50% of total capacity, and the deadline to critical path length ratio, which we call the stretch φ, surpasses 2. This paper improves this bound further by introducing a hierarchical scheduling scheme based on the novel notion of a Packing Server, inspired by servers for aperiodic tasks. The Packing Server consists of multiple periodically replenished budgets that can execute in parallel and that appear as independent tasks to the underlying scheduler. Hence, the original problem of scheduling MapReduce workflows reduces to that of scheduling independent tasks. We prove that the utilization bound for schedulability of MapReduce workflows is dependent on the underlying independent task scheduling policy, and β is a tunable parameter that controls the maximum individual budget utilization.
UR - http://www.scopus.com/inward/record.url?scp=84944683703&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84944683703&partnerID=8YFLogxK
U2 - 10.1109/RTAS.2015.7108416
DO - 10.1109/RTAS.2015.7108416
M3 - Conference contribution
AN - SCOPUS:84944683703
T3 - Proceedings of the IEEE Real-Time and Embedded Technology and Applications Symposium, RTAS
SP - 51
EP - 62
BT - Proceedings - 21st IEEE Real Time and Embedded Technology and Applications Symposium, RTAS 2015
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 21st IEEE Real Time and Embedded Technology and Applications Symposium, RTAS 2015
Y2 - 13 April 2015 through 16 April 2015
ER -