Deadline-based workload management for MapReduce environments: Pieces of the perfromance puzzle

Abhishek Verma, Ludmila Cherkasova, Vijay S. Kumar, Roy H. Campbell

Research output: Contribution to journalArticle

Abstract

Hadoop and the associated MapReduce paradigm have become the de facto platform for cost-effective analytics over "Big Data". There is an increasing number of MapReduce applications associated with live business intelligence that require completion time guarantees. In this work, we introduce and analyze a set of complementary mechanisms that enhance workload management decisions for processing MapReduce jobs with deadlines. The three mechanisms we consider are the following: 1) a policy for job ordering in the processing queue; 2) a mechanism for allocating a tailored number of map and reduce slots to each job with a completion time requirement; 3) a mechanism for allocating and deallocating (if necessary) spare resources in the system among the active jobs. We analyze the functionality and performance benefits of each mechanism via an extensive set of simulations over diverse workload sets. The proposed mechanisms form the integral pieces in the performance puzzle of automated workload management in MapReduce environments.

Original languageEnglish (US)
JournalHP Laboratories Technical Report
Issue number82
StatePublished - May 4 2012

Fingerprint

Competitive intelligence
Processing
Costs
Big data

Keywords

  • Hadoop
  • Job scheduling
  • MapReduce
  • Performance
  • Resource allocation

ASJC Scopus subject areas

  • Software
  • Hardware and Architecture
  • Computer Networks and Communications

Cite this

Deadline-based workload management for MapReduce environments : Pieces of the perfromance puzzle. / Verma, Abhishek; Cherkasova, Ludmila; Kumar, Vijay S.; Campbell, Roy H.

In: HP Laboratories Technical Report, No. 82, 04.05.2012.

Research output: Contribution to journalArticle

@article{19653d5a53c14d37b05a2c75de069efc,
title = "Deadline-based workload management for MapReduce environments: Pieces of the perfromance puzzle",
abstract = "Hadoop and the associated MapReduce paradigm have become the de facto platform for cost-effective analytics over {"}Big Data{"}. There is an increasing number of MapReduce applications associated with live business intelligence that require completion time guarantees. In this work, we introduce and analyze a set of complementary mechanisms that enhance workload management decisions for processing MapReduce jobs with deadlines. The three mechanisms we consider are the following: 1) a policy for job ordering in the processing queue; 2) a mechanism for allocating a tailored number of map and reduce slots to each job with a completion time requirement; 3) a mechanism for allocating and deallocating (if necessary) spare resources in the system among the active jobs. We analyze the functionality and performance benefits of each mechanism via an extensive set of simulations over diverse workload sets. The proposed mechanisms form the integral pieces in the performance puzzle of automated workload management in MapReduce environments.",
keywords = "Hadoop, Job scheduling, MapReduce, Performance, Resource allocation",
author = "Abhishek Verma and Ludmila Cherkasova and Kumar, {Vijay S.} and Campbell, {Roy H.}",
year = "2012",
month = "5",
day = "4",
language = "English (US)",
journal = "HP Laboratories Technical Report",
number = "82",

}

TY - JOUR

T1 - Deadline-based workload management for MapReduce environments

T2 - Pieces of the perfromance puzzle

AU - Verma, Abhishek

AU - Cherkasova, Ludmila

AU - Kumar, Vijay S.

AU - Campbell, Roy H.

PY - 2012/5/4

Y1 - 2012/5/4

N2 - Hadoop and the associated MapReduce paradigm have become the de facto platform for cost-effective analytics over "Big Data". There is an increasing number of MapReduce applications associated with live business intelligence that require completion time guarantees. In this work, we introduce and analyze a set of complementary mechanisms that enhance workload management decisions for processing MapReduce jobs with deadlines. The three mechanisms we consider are the following: 1) a policy for job ordering in the processing queue; 2) a mechanism for allocating a tailored number of map and reduce slots to each job with a completion time requirement; 3) a mechanism for allocating and deallocating (if necessary) spare resources in the system among the active jobs. We analyze the functionality and performance benefits of each mechanism via an extensive set of simulations over diverse workload sets. The proposed mechanisms form the integral pieces in the performance puzzle of automated workload management in MapReduce environments.

AB - Hadoop and the associated MapReduce paradigm have become the de facto platform for cost-effective analytics over "Big Data". There is an increasing number of MapReduce applications associated with live business intelligence that require completion time guarantees. In this work, we introduce and analyze a set of complementary mechanisms that enhance workload management decisions for processing MapReduce jobs with deadlines. The three mechanisms we consider are the following: 1) a policy for job ordering in the processing queue; 2) a mechanism for allocating a tailored number of map and reduce slots to each job with a completion time requirement; 3) a mechanism for allocating and deallocating (if necessary) spare resources in the system among the active jobs. We analyze the functionality and performance benefits of each mechanism via an extensive set of simulations over diverse workload sets. The proposed mechanisms form the integral pieces in the performance puzzle of automated workload management in MapReduce environments.

KW - Hadoop

KW - Job scheduling

KW - MapReduce

KW - Performance

KW - Resource allocation

UR - http://www.scopus.com/inward/record.url?scp=84860376761&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84860376761&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:84860376761

JO - HP Laboratories Technical Report

JF - HP Laboratories Technical Report

IS - 82

ER -