ARIA: Automatic resource inference and allocation for mapreduce environments

Abhishek Verma, Ludmila Cherkasova, Roy H. Campbell

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

MapReduce and Hadoop represent an economically compelling alternative for efficient large scale data processing and advanced analytics in the enterprise. A key challenge in shared MapReduce clusters is the ability to automatically tailor and control resource allocations to different applications for achieving their performance goals. Currently, there is no job scheduler for MapReduce environments that given a job completion deadline, could allocate the appropriate amount of resources to the job so that it meets the required Service Level Objective (SLO). In this work, we propose a framework, called ARIA, to address this problem. It comprises of three inter-related components. First, for a production job that is routinely executed on a new dataset, we build a job profile that compactly summarizes critical performance characteristics of the underlying application during the map and reduce stages. Second, we design a MapReduce performance model, that for a given job (with a known profile) and its SLO (soft deadline), estimates the amount of resources required for job completion within the deadline. Finally, we implement a novel SLO-based scheduler in Hadoop that determines job ordering and the amount of resources to allocate for meeting the job deadlines. We validate our approach using a set of realistic applications. The new scheduler effectively meets the jobs' SLOs until the job demands exceed the cluster resources. The results of the extensive simulation study are validated through detailed experiments on a 66-node Hadoop cluster.

Original languageEnglish (US)
Title of host publicationProceedings of the 8th ACM International Conference on Autonomic Computing, ICAC 2011 and Co-located Workshops
Pages235-244
Number of pages10
DOIs
StatePublished - Jul 15 2011
Event8th ACM International Conference on Autonomic Computing, ICAC 2011 and Co-located Workshops - Karlsruhe, Germany
Duration: Jun 14 2011Jun 18 2011

Publication series

NameProceedings of the 8th ACM International Conference on Autonomic Computing, ICAC 2011 and Co-located Workshops

Other

Other8th ACM International Conference on Autonomic Computing, ICAC 2011 and Co-located Workshops
CountryGermany
CityKarlsruhe
Period6/14/116/18/11

Fingerprint

MapReduce
Deadline
Service Levels
Scheduler
Resources
Completion
Resource allocation
Performance Model
Resource Allocation
Exceed
Simulation Study
Alternatives
Industry
Vertex of a graph
Experiments
Estimate
Experiment
Profile

Keywords

  • mapreduce
  • modeling
  • resource allocation
  • scheduling

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Applied Mathematics

Cite this

Verma, A., Cherkasova, L., & Campbell, R. H. (2011). ARIA: Automatic resource inference and allocation for mapreduce environments. In Proceedings of the 8th ACM International Conference on Autonomic Computing, ICAC 2011 and Co-located Workshops (pp. 235-244). (Proceedings of the 8th ACM International Conference on Autonomic Computing, ICAC 2011 and Co-located Workshops). https://doi.org/10.1145/1998582.1998637

ARIA : Automatic resource inference and allocation for mapreduce environments. / Verma, Abhishek; Cherkasova, Ludmila; Campbell, Roy H.

Proceedings of the 8th ACM International Conference on Autonomic Computing, ICAC 2011 and Co-located Workshops. 2011. p. 235-244 (Proceedings of the 8th ACM International Conference on Autonomic Computing, ICAC 2011 and Co-located Workshops).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Verma, A, Cherkasova, L & Campbell, RH 2011, ARIA: Automatic resource inference and allocation for mapreduce environments. in Proceedings of the 8th ACM International Conference on Autonomic Computing, ICAC 2011 and Co-located Workshops. Proceedings of the 8th ACM International Conference on Autonomic Computing, ICAC 2011 and Co-located Workshops, pp. 235-244, 8th ACM International Conference on Autonomic Computing, ICAC 2011 and Co-located Workshops, Karlsruhe, Germany, 6/14/11. https://doi.org/10.1145/1998582.1998637
Verma A, Cherkasova L, Campbell RH. ARIA: Automatic resource inference and allocation for mapreduce environments. In Proceedings of the 8th ACM International Conference on Autonomic Computing, ICAC 2011 and Co-located Workshops. 2011. p. 235-244. (Proceedings of the 8th ACM International Conference on Autonomic Computing, ICAC 2011 and Co-located Workshops). https://doi.org/10.1145/1998582.1998637
Verma, Abhishek ; Cherkasova, Ludmila ; Campbell, Roy H. / ARIA : Automatic resource inference and allocation for mapreduce environments. Proceedings of the 8th ACM International Conference on Autonomic Computing, ICAC 2011 and Co-located Workshops. 2011. pp. 235-244 (Proceedings of the 8th ACM International Conference on Autonomic Computing, ICAC 2011 and Co-located Workshops).
@inproceedings{ab91f817ee664c7c89cc4ca52b4258a3,
title = "ARIA: Automatic resource inference and allocation for mapreduce environments",
abstract = "MapReduce and Hadoop represent an economically compelling alternative for efficient large scale data processing and advanced analytics in the enterprise. A key challenge in shared MapReduce clusters is the ability to automatically tailor and control resource allocations to different applications for achieving their performance goals. Currently, there is no job scheduler for MapReduce environments that given a job completion deadline, could allocate the appropriate amount of resources to the job so that it meets the required Service Level Objective (SLO). In this work, we propose a framework, called ARIA, to address this problem. It comprises of three inter-related components. First, for a production job that is routinely executed on a new dataset, we build a job profile that compactly summarizes critical performance characteristics of the underlying application during the map and reduce stages. Second, we design a MapReduce performance model, that for a given job (with a known profile) and its SLO (soft deadline), estimates the amount of resources required for job completion within the deadline. Finally, we implement a novel SLO-based scheduler in Hadoop that determines job ordering and the amount of resources to allocate for meeting the job deadlines. We validate our approach using a set of realistic applications. The new scheduler effectively meets the jobs' SLOs until the job demands exceed the cluster resources. The results of the extensive simulation study are validated through detailed experiments on a 66-node Hadoop cluster.",
keywords = "mapreduce, modeling, resource allocation, scheduling",
author = "Abhishek Verma and Ludmila Cherkasova and Campbell, {Roy H.}",
year = "2011",
month = "7",
day = "15",
doi = "10.1145/1998582.1998637",
language = "English (US)",
isbn = "9781450306072",
series = "Proceedings of the 8th ACM International Conference on Autonomic Computing, ICAC 2011 and Co-located Workshops",
pages = "235--244",
booktitle = "Proceedings of the 8th ACM International Conference on Autonomic Computing, ICAC 2011 and Co-located Workshops",

}

TY - GEN

T1 - ARIA

T2 - Automatic resource inference and allocation for mapreduce environments

AU - Verma, Abhishek

AU - Cherkasova, Ludmila

AU - Campbell, Roy H.

PY - 2011/7/15

Y1 - 2011/7/15

N2 - MapReduce and Hadoop represent an economically compelling alternative for efficient large scale data processing and advanced analytics in the enterprise. A key challenge in shared MapReduce clusters is the ability to automatically tailor and control resource allocations to different applications for achieving their performance goals. Currently, there is no job scheduler for MapReduce environments that given a job completion deadline, could allocate the appropriate amount of resources to the job so that it meets the required Service Level Objective (SLO). In this work, we propose a framework, called ARIA, to address this problem. It comprises of three inter-related components. First, for a production job that is routinely executed on a new dataset, we build a job profile that compactly summarizes critical performance characteristics of the underlying application during the map and reduce stages. Second, we design a MapReduce performance model, that for a given job (with a known profile) and its SLO (soft deadline), estimates the amount of resources required for job completion within the deadline. Finally, we implement a novel SLO-based scheduler in Hadoop that determines job ordering and the amount of resources to allocate for meeting the job deadlines. We validate our approach using a set of realistic applications. The new scheduler effectively meets the jobs' SLOs until the job demands exceed the cluster resources. The results of the extensive simulation study are validated through detailed experiments on a 66-node Hadoop cluster.

AB - MapReduce and Hadoop represent an economically compelling alternative for efficient large scale data processing and advanced analytics in the enterprise. A key challenge in shared MapReduce clusters is the ability to automatically tailor and control resource allocations to different applications for achieving their performance goals. Currently, there is no job scheduler for MapReduce environments that given a job completion deadline, could allocate the appropriate amount of resources to the job so that it meets the required Service Level Objective (SLO). In this work, we propose a framework, called ARIA, to address this problem. It comprises of three inter-related components. First, for a production job that is routinely executed on a new dataset, we build a job profile that compactly summarizes critical performance characteristics of the underlying application during the map and reduce stages. Second, we design a MapReduce performance model, that for a given job (with a known profile) and its SLO (soft deadline), estimates the amount of resources required for job completion within the deadline. Finally, we implement a novel SLO-based scheduler in Hadoop that determines job ordering and the amount of resources to allocate for meeting the job deadlines. We validate our approach using a set of realistic applications. The new scheduler effectively meets the jobs' SLOs until the job demands exceed the cluster resources. The results of the extensive simulation study are validated through detailed experiments on a 66-node Hadoop cluster.

KW - mapreduce

KW - modeling

KW - resource allocation

KW - scheduling

UR - http://www.scopus.com/inward/record.url?scp=79960196705&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79960196705&partnerID=8YFLogxK

U2 - 10.1145/1998582.1998637

DO - 10.1145/1998582.1998637

M3 - Conference contribution

AN - SCOPUS:79960196705

SN - 9781450306072

T3 - Proceedings of the 8th ACM International Conference on Autonomic Computing, ICAC 2011 and Co-located Workshops

SP - 235

EP - 244

BT - Proceedings of the 8th ACM International Conference on Autonomic Computing, ICAC 2011 and Co-located Workshops

ER -