SLO-driven right-sizing and resource provisioning of MapReduce jobs

Abhishek Verma, Ludmila Cherkasova, Roy H. Campbell

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

There is an increasing number of Map Reduce applications, e.g., personalized advertising, spam detection, real-time event log analysis, that require completion time guarantees or need to be completed within a given time window. Currently, there is a lack of performance models and workload analy-sis tools available to system administrators for automated performance management of such Map Reduce jobs. In this work, we outline a novel framework for SLO-driven resource provisioning and sizing of Map Reduce jobs. First, we pro-pose an automated profiling tool that extracts a compact job profile from the past application run(s) or by executing it on a smaller data set. Then, by applying a linear regression technique, we derive scaling factors to accurately project the application performance when processing a larger data-set. The job profile (with scaling factors) forms the basis of a Map Reduce performance model that computes the lower and upper bounds on the job completion time. Finally, we provide a fast and efficient capacity planning model that for a Map Reduce job with timing requirements generates a set of resource provisioning options. We validate the accuracy of our models by executing a set of realistic applications with different timing requirements on the 66-node Hadoop cluster.

Original languageEnglish (US)
Title of host publicationHP Laboratories Technical Report
Edition126
StatePublished - Aug 31 2011

Fingerprint

Linear regression
Marketing
Planning
Processing

ASJC Scopus subject areas

  • Software
  • Hardware and Architecture
  • Computer Networks and Communications

Cite this

Verma, A., Cherkasova, L., & Campbell, R. H. (2011). SLO-driven right-sizing and resource provisioning of MapReduce jobs. In HP Laboratories Technical Report (126 ed.)

SLO-driven right-sizing and resource provisioning of MapReduce jobs. / Verma, Abhishek; Cherkasova, Ludmila; Campbell, Roy H.

HP Laboratories Technical Report. 126. ed. 2011.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Verma, A, Cherkasova, L & Campbell, RH 2011, SLO-driven right-sizing and resource provisioning of MapReduce jobs. in HP Laboratories Technical Report. 126 edn.
Verma A, Cherkasova L, Campbell RH. SLO-driven right-sizing and resource provisioning of MapReduce jobs. In HP Laboratories Technical Report. 126 ed. 2011
Verma, Abhishek ; Cherkasova, Ludmila ; Campbell, Roy H. / SLO-driven right-sizing and resource provisioning of MapReduce jobs. HP Laboratories Technical Report. 126. ed. 2011.
@inproceedings{a53e3009e8ff443d84eee12b747503aa,
title = "SLO-driven right-sizing and resource provisioning of MapReduce jobs",
abstract = "There is an increasing number of Map Reduce applications, e.g., personalized advertising, spam detection, real-time event log analysis, that require completion time guarantees or need to be completed within a given time window. Currently, there is a lack of performance models and workload analy-sis tools available to system administrators for automated performance management of such Map Reduce jobs. In this work, we outline a novel framework for SLO-driven resource provisioning and sizing of Map Reduce jobs. First, we pro-pose an automated profiling tool that extracts a compact job profile from the past application run(s) or by executing it on a smaller data set. Then, by applying a linear regression technique, we derive scaling factors to accurately project the application performance when processing a larger data-set. The job profile (with scaling factors) forms the basis of a Map Reduce performance model that computes the lower and upper bounds on the job completion time. Finally, we provide a fast and efficient capacity planning model that for a Map Reduce job with timing requirements generates a set of resource provisioning options. We validate the accuracy of our models by executing a set of realistic applications with different timing requirements on the 66-node Hadoop cluster.",
author = "Abhishek Verma and Ludmila Cherkasova and Campbell, {Roy H.}",
year = "2011",
month = "8",
day = "31",
language = "English (US)",
booktitle = "HP Laboratories Technical Report",
edition = "126",

}

TY - GEN

T1 - SLO-driven right-sizing and resource provisioning of MapReduce jobs

AU - Verma, Abhishek

AU - Cherkasova, Ludmila

AU - Campbell, Roy H.

PY - 2011/8/31

Y1 - 2011/8/31

N2 - There is an increasing number of Map Reduce applications, e.g., personalized advertising, spam detection, real-time event log analysis, that require completion time guarantees or need to be completed within a given time window. Currently, there is a lack of performance models and workload analy-sis tools available to system administrators for automated performance management of such Map Reduce jobs. In this work, we outline a novel framework for SLO-driven resource provisioning and sizing of Map Reduce jobs. First, we pro-pose an automated profiling tool that extracts a compact job profile from the past application run(s) or by executing it on a smaller data set. Then, by applying a linear regression technique, we derive scaling factors to accurately project the application performance when processing a larger data-set. The job profile (with scaling factors) forms the basis of a Map Reduce performance model that computes the lower and upper bounds on the job completion time. Finally, we provide a fast and efficient capacity planning model that for a Map Reduce job with timing requirements generates a set of resource provisioning options. We validate the accuracy of our models by executing a set of realistic applications with different timing requirements on the 66-node Hadoop cluster.

AB - There is an increasing number of Map Reduce applications, e.g., personalized advertising, spam detection, real-time event log analysis, that require completion time guarantees or need to be completed within a given time window. Currently, there is a lack of performance models and workload analy-sis tools available to system administrators for automated performance management of such Map Reduce jobs. In this work, we outline a novel framework for SLO-driven resource provisioning and sizing of Map Reduce jobs. First, we pro-pose an automated profiling tool that extracts a compact job profile from the past application run(s) or by executing it on a smaller data set. Then, by applying a linear regression technique, we derive scaling factors to accurately project the application performance when processing a larger data-set. The job profile (with scaling factors) forms the basis of a Map Reduce performance model that computes the lower and upper bounds on the job completion time. Finally, we provide a fast and efficient capacity planning model that for a Map Reduce job with timing requirements generates a set of resource provisioning options. We validate the accuracy of our models by executing a set of realistic applications with different timing requirements on the 66-node Hadoop cluster.

UR - http://www.scopus.com/inward/record.url?scp=80052103882&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=80052103882&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:80052103882

BT - HP Laboratories Technical Report

ER -