Resource provisioning framework for MapReduce jobs with performance goals

Abhishek Verma, Ludmila Cherkasova, Roy H. Campbell

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Many companies are increasingly using MapReduce for efficient large scale data processing such as personalized advertising, spam detection, and different data mining tasks. Cloud computing offers an attractive option for businesses to rent a suitable size Hadoop cluster, consume resources as a service, and pay only for resources that were utilized. One of the open questions in such environments is the amount of resources that a user should lease from the service provider. Often, a user targets specific performance goals and the application needs to complete data processing by a certain time deadline. However, currently, the task of estimating required resources to meet application performance goals is solely the users' responsibility. In this work, we introduce a novel framework and technique to address this problem and to offer a new resource sizing and provisioning service in MapReduce environments. For a MapReduce job that needs to be completed within a certain time, the job profile is built from the job past executions or by executing the application on a smaller data set using an automated profiling tool. Then, by applying scaling rules combined with a fast and efficient capacity planning model, we generate a set of resource provisioning options. Moreover, we design a model for estimating the impact of node failures on a job completion time to evaluate worst case scenarios. We validate the accuracy of our models using a set of realistic applications. The predicted completion times of generated resource provisioning options are within 10% of the measured times in our 66-node Hadoop cluster.

Original languageEnglish (US)
Title of host publicationMiddleware 2011 - ACM/IFIP/USENIX 12th International Middleware Conference, Proceedings
Pages165-186
Number of pages22
DOIs
StatePublished - Dec 23 2011
Event12th ACM/IFIP/USENIX International Middleware Conference, Middleware 2011 - Lisbon, Portugal
Duration: Dec 12 2011Dec 16 2011

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume7049 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other12th ACM/IFIP/USENIX International Middleware Conference, Middleware 2011
CountryPortugal
CityLisbon
Period12/12/1112/16/11

Fingerprint

MapReduce
Resources
Completion Time
Cloud computing
Data mining
Marketing
Industry
Capacity Planning
Spam
Planning
Deadline
Vertex of a graph
Framework
Profiling
Cloud Computing
Data Mining
Model
Scaling
Scenarios
Target

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Verma, A., Cherkasova, L., & Campbell, R. H. (2011). Resource provisioning framework for MapReduce jobs with performance goals. In Middleware 2011 - ACM/IFIP/USENIX 12th International Middleware Conference, Proceedings (pp. 165-186). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 7049 LNCS). https://doi.org/10.1007/978-3-642-25821-3_9

Resource provisioning framework for MapReduce jobs with performance goals. / Verma, Abhishek; Cherkasova, Ludmila; Campbell, Roy H.

Middleware 2011 - ACM/IFIP/USENIX 12th International Middleware Conference, Proceedings. 2011. p. 165-186 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 7049 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Verma, A, Cherkasova, L & Campbell, RH 2011, Resource provisioning framework for MapReduce jobs with performance goals. in Middleware 2011 - ACM/IFIP/USENIX 12th International Middleware Conference, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 7049 LNCS, pp. 165-186, 12th ACM/IFIP/USENIX International Middleware Conference, Middleware 2011, Lisbon, Portugal, 12/12/11. https://doi.org/10.1007/978-3-642-25821-3_9
Verma A, Cherkasova L, Campbell RH. Resource provisioning framework for MapReduce jobs with performance goals. In Middleware 2011 - ACM/IFIP/USENIX 12th International Middleware Conference, Proceedings. 2011. p. 165-186. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-642-25821-3_9
Verma, Abhishek ; Cherkasova, Ludmila ; Campbell, Roy H. / Resource provisioning framework for MapReduce jobs with performance goals. Middleware 2011 - ACM/IFIP/USENIX 12th International Middleware Conference, Proceedings. 2011. pp. 165-186 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{94720cb24d7e40e990c36352a005a5c2,
title = "Resource provisioning framework for MapReduce jobs with performance goals",
abstract = "Many companies are increasingly using MapReduce for efficient large scale data processing such as personalized advertising, spam detection, and different data mining tasks. Cloud computing offers an attractive option for businesses to rent a suitable size Hadoop cluster, consume resources as a service, and pay only for resources that were utilized. One of the open questions in such environments is the amount of resources that a user should lease from the service provider. Often, a user targets specific performance goals and the application needs to complete data processing by a certain time deadline. However, currently, the task of estimating required resources to meet application performance goals is solely the users' responsibility. In this work, we introduce a novel framework and technique to address this problem and to offer a new resource sizing and provisioning service in MapReduce environments. For a MapReduce job that needs to be completed within a certain time, the job profile is built from the job past executions or by executing the application on a smaller data set using an automated profiling tool. Then, by applying scaling rules combined with a fast and efficient capacity planning model, we generate a set of resource provisioning options. Moreover, we design a model for estimating the impact of node failures on a job completion time to evaluate worst case scenarios. We validate the accuracy of our models using a set of realistic applications. The predicted completion times of generated resource provisioning options are within 10{\%} of the measured times in our 66-node Hadoop cluster.",
author = "Abhishek Verma and Ludmila Cherkasova and Campbell, {Roy H.}",
year = "2011",
month = "12",
day = "23",
doi = "10.1007/978-3-642-25821-3_9",
language = "English (US)",
isbn = "9783642258206",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "165--186",
booktitle = "Middleware 2011 - ACM/IFIP/USENIX 12th International Middleware Conference, Proceedings",

}

TY - GEN

T1 - Resource provisioning framework for MapReduce jobs with performance goals

AU - Verma, Abhishek

AU - Cherkasova, Ludmila

AU - Campbell, Roy H.

PY - 2011/12/23

Y1 - 2011/12/23

N2 - Many companies are increasingly using MapReduce for efficient large scale data processing such as personalized advertising, spam detection, and different data mining tasks. Cloud computing offers an attractive option for businesses to rent a suitable size Hadoop cluster, consume resources as a service, and pay only for resources that were utilized. One of the open questions in such environments is the amount of resources that a user should lease from the service provider. Often, a user targets specific performance goals and the application needs to complete data processing by a certain time deadline. However, currently, the task of estimating required resources to meet application performance goals is solely the users' responsibility. In this work, we introduce a novel framework and technique to address this problem and to offer a new resource sizing and provisioning service in MapReduce environments. For a MapReduce job that needs to be completed within a certain time, the job profile is built from the job past executions or by executing the application on a smaller data set using an automated profiling tool. Then, by applying scaling rules combined with a fast and efficient capacity planning model, we generate a set of resource provisioning options. Moreover, we design a model for estimating the impact of node failures on a job completion time to evaluate worst case scenarios. We validate the accuracy of our models using a set of realistic applications. The predicted completion times of generated resource provisioning options are within 10% of the measured times in our 66-node Hadoop cluster.

AB - Many companies are increasingly using MapReduce for efficient large scale data processing such as personalized advertising, spam detection, and different data mining tasks. Cloud computing offers an attractive option for businesses to rent a suitable size Hadoop cluster, consume resources as a service, and pay only for resources that were utilized. One of the open questions in such environments is the amount of resources that a user should lease from the service provider. Often, a user targets specific performance goals and the application needs to complete data processing by a certain time deadline. However, currently, the task of estimating required resources to meet application performance goals is solely the users' responsibility. In this work, we introduce a novel framework and technique to address this problem and to offer a new resource sizing and provisioning service in MapReduce environments. For a MapReduce job that needs to be completed within a certain time, the job profile is built from the job past executions or by executing the application on a smaller data set using an automated profiling tool. Then, by applying scaling rules combined with a fast and efficient capacity planning model, we generate a set of resource provisioning options. Moreover, we design a model for estimating the impact of node failures on a job completion time to evaluate worst case scenarios. We validate the accuracy of our models using a set of realistic applications. The predicted completion times of generated resource provisioning options are within 10% of the measured times in our 66-node Hadoop cluster.

UR - http://www.scopus.com/inward/record.url?scp=83755196778&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=83755196778&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-25821-3_9

DO - 10.1007/978-3-642-25821-3_9

M3 - Conference contribution

AN - SCOPUS:83755196778

SN - 9783642258206

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 165

EP - 186

BT - Middleware 2011 - ACM/IFIP/USENIX 12th International Middleware Conference, Proceedings

ER -