TY - GEN
T1 - Exploiting large system dynamics for designing simple data center schedulers
AU - Zheng, Yousi
AU - Shroff, Ness B.
AU - Srikant, R.
AU - Sinha, Prasun
PY - 2015/8/21
Y1 - 2015/8/21
N2 - The number and size of data centers has seen a rapid growth in the last few years. It is no longer uncommon to see large data centers with thousands or even tens of thousands of machines. Hence, it is critical to develop scalable scheduling mechanisms for processing the enormous number of jobs handled by popular paradigms such as the MapReduce framework. This work explores the possibility of simplifying the scheduling procedure by exploiting the 'largeness' of the data center system. Specifically, we consider the problem of minimizing the total flow time of a sequence of jobs under the MapReduce framework, where the jobs arrive over time and need to be processed through both Map and Reduce procedures before leaving the system. We show that any work-conserving scheduler is asymptotically optimal under a wide range of traffic loads, including the heavy traffic limit. Our results are shown for scenarios in which the tasks can be preempted and served in parallel over different machines, as well as scenarios when each task has to be served only on one machine and cannot be preempted. This result implies, somewhat surprisingly, that when we have a large number of machines, there is little to be gained by optimizing beyond ensuring that a scheduler should be work-conserving. For long running applications, we also study the relationship between the number of machines and total running time, and show sufficient conditions to guarantee the asymptotic optimality of work-conserving schedulers. Further, we run extensive simulations, that indeed verify that when the total number of machines is large, state-of-the-art work-conserving schedulers have similar and close-to-optimal delay performance.
AB - The number and size of data centers has seen a rapid growth in the last few years. It is no longer uncommon to see large data centers with thousands or even tens of thousands of machines. Hence, it is critical to develop scalable scheduling mechanisms for processing the enormous number of jobs handled by popular paradigms such as the MapReduce framework. This work explores the possibility of simplifying the scheduling procedure by exploiting the 'largeness' of the data center system. Specifically, we consider the problem of minimizing the total flow time of a sequence of jobs under the MapReduce framework, where the jobs arrive over time and need to be processed through both Map and Reduce procedures before leaving the system. We show that any work-conserving scheduler is asymptotically optimal under a wide range of traffic loads, including the heavy traffic limit. Our results are shown for scenarios in which the tasks can be preempted and served in parallel over different machines, as well as scenarios when each task has to be served only on one machine and cannot be preempted. This result implies, somewhat surprisingly, that when we have a large number of machines, there is little to be gained by optimizing beyond ensuring that a scheduler should be work-conserving. For long running applications, we also study the relationship between the number of machines and total running time, and show sufficient conditions to guarantee the asymptotic optimality of work-conserving schedulers. Further, we run extensive simulations, that indeed verify that when the total number of machines is large, state-of-the-art work-conserving schedulers have similar and close-to-optimal delay performance.
UR - http://www.scopus.com/inward/record.url?scp=84954551096&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84954551096&partnerID=8YFLogxK
U2 - 10.1109/INFOCOM.2015.7218405
DO - 10.1109/INFOCOM.2015.7218405
M3 - Conference contribution
AN - SCOPUS:84954551096
T3 - Proceedings - IEEE INFOCOM
SP - 397
EP - 405
BT - 2015 IEEE Conference on Computer Communications, IEEE INFOCOM 2015
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 34th IEEE Annual Conference on Computer Communications and Networks, IEEE INFOCOM 2015
Y2 - 26 April 2015 through 1 May 2015
ER -