SLO-driven right-sizing and resource provisioning of MapReduce jobs

Abhishek Verma, Ludmila Cherkasova, Roy H. Campbell

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

There is an increasing number of Map Reduce applications, e.g., personalized advertising, spam detection, real-time event log analysis, that require completion time guarantees or need to be completed within a given time window. Currently, there is a lack of performance models and workload analy-sis tools available to system administrators for automated performance management of such Map Reduce jobs. In this work, we outline a novel framework for SLO-driven resource provisioning and sizing of Map Reduce jobs. First, we pro-pose an automated profiling tool that extracts a compact job profile from the past application run(s) or by executing it on a smaller data set. Then, by applying a linear regression technique, we derive scaling factors to accurately project the application performance when processing a larger data-set. The job profile (with scaling factors) forms the basis of a Map Reduce performance model that computes the lower and upper bounds on the job completion time. Finally, we provide a fast and efficient capacity planning model that for a Map Reduce job with timing requirements generates a set of resource provisioning options. We validate the accuracy of our models by executing a set of realistic applications with different timing requirements on the 66-node Hadoop cluster.

Original languageEnglish (US)
Title of host publicationHP Laboratories Technical Report
Edition126
StatePublished - Aug 31 2011

    Fingerprint

ASJC Scopus subject areas

  • Software
  • Hardware and Architecture
  • Computer Networks and Communications

Cite this

Verma, A., Cherkasova, L., & Campbell, R. H. (2011). SLO-driven right-sizing and resource provisioning of MapReduce jobs. In HP Laboratories Technical Report (126 ed.)