Two sides of a coin: Optimizing the schedule of mapreduce jobs to minimize their makespan and improve cluster performance

Abhishek Verma, Ludmila Cherkasova, Roy H. Campbell

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Large-scale MapReduce clusters that routinely process petabytes of unstructured and semi-structured data represent a new entity in the changing landscape of clouds. A key challenge is to increase the utilization of these MapReduce clusters. In this work, we consider a subset of the production workload that consists of MapReduce jobs with no dependencies. We observe that the order in which these jobs are executed can have a significant impact on their overall completion time and the cluster resource utilization. Our goal is to automate the design of a job schedule that minimizes the completion time (makespan) of such a set of MapReduce jobs. We offer a novel abstraction framework and a heuristic, called BalancedPools, that efficiently utilizes performance properties of MapReduce jobs in a given workload for constructing an optimized job schedule. Simulations performed over a realistic workload demonstrate that 15%-38% makespan improvements are achievable by simply processing the jobs in the right order.

Original languageEnglish (US)
Title of host publicationProceedings of the 2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, MASCOTS 2012
Pages11-18
Number of pages8
DOIs
StatePublished - 2012
Event2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, MASCOTS 2012 - Washington, DC, United States
Duration: Aug 7 2012Aug 9 2012

Publication series

NameProceedings of the 2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, MASCOTS 2012

Other

Other2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, MASCOTS 2012
Country/TerritoryUnited States
CityWashington, DC
Period8/7/128/9/12

Keywords

  • Hadoop
  • MapReduce
  • batch workloads
  • minimized makespan
  • optimized schedule

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Modeling and Simulation
  • Communication

Fingerprint

Dive into the research topics of 'Two sides of a coin: Optimizing the schedule of mapreduce jobs to minimize their makespan and improve cluster performance'. Together they form a unique fingerprint.

Cite this