There is an increasing number of Map Reduce applications, e.g., personalized advertising, spam detection, real-time event log analysis, that require completion time guarantees or need to be completed within a given time window. Currently, there is a lack of performance models and workload analy-sis tools available to system administrators for automated performance management of such Map Reduce jobs. In this work, we outline a novel framework for SLO-driven resource provisioning and sizing of Map Reduce jobs. First, we pro-pose an automated profiling tool that extracts a compact job profile from the past application run(s) or by executing it on a smaller data set. Then, by applying a linear regression technique, we derive scaling factors to accurately project the application performance when processing a larger data-set. The job profile (with scaling factors) forms the basis of a Map Reduce performance model that computes the lower and upper bounds on the job completion time. Finally, we provide a fast and efficient capacity planning model that for a Map Reduce job with timing requirements generates a set of resource provisioning options. We validate the accuracy of our models by executing a set of realistic applications with different timing requirements on the 66-node Hadoop cluster.
|Original language||English (US)|
|Title of host publication||HP Laboratories Technical Report|
|State||Published - Aug 31 2011|
ASJC Scopus subject areas
- Hardware and Architecture
- Computer Networks and Communications