Breaking the MapReduce stage barrier

Abhishek Verma, Brian Cho, Nicolas Zea, Indranil Gupta, Roy H. Campbell

Research output: Contribution to journalArticle

Abstract

The MapReduce model uses a barrier between the Map and Reduce stages. This provides simplicity in both programming and implementation. However, in many situations, this barrier hurts performance because it is overly restrictive. Hence, we develop a method to break the barrier in MapReduce in a way that improves efficiency. Careful design of our barrier-less MapReduce framework results in equivalent generality and retains ease of programming. We motivate our case with, and experimentally study our barrier-less techniques in, a wide variety of MapReduce applications divided into seven classes. Our experiments show that our approach can achieve better job completion times than a traditional MapReduce framework. This is due primarily to the interleaving of I/O and computation, and forgoing disk-intensive work. We achieve a reduction in job completion times that is 25% on average and 87% in the best case.

Original languageEnglish (US)
Pages (from-to)191-206
Number of pages16
JournalCluster Computing
Volume16
Issue number1
DOIs
StatePublished - Jan 1 2013

Fingerprint

Experiments

Keywords

  • Data-intensive computing
  • MapReduce

ASJC Scopus subject areas

  • Software
  • Computer Networks and Communications

Cite this

Breaking the MapReduce stage barrier. / Verma, Abhishek; Cho, Brian; Zea, Nicolas; Gupta, Indranil; Campbell, Roy H.

In: Cluster Computing, Vol. 16, No. 1, 01.01.2013, p. 191-206.

Research output: Contribution to journalArticle

Verma, Abhishek ; Cho, Brian ; Zea, Nicolas ; Gupta, Indranil ; Campbell, Roy H. / Breaking the MapReduce stage barrier. In: Cluster Computing. 2013 ; Vol. 16, No. 1. pp. 191-206.
@article{4e7c36576c8843b78545594875d29e92,
title = "Breaking the MapReduce stage barrier",
abstract = "The MapReduce model uses a barrier between the Map and Reduce stages. This provides simplicity in both programming and implementation. However, in many situations, this barrier hurts performance because it is overly restrictive. Hence, we develop a method to break the barrier in MapReduce in a way that improves efficiency. Careful design of our barrier-less MapReduce framework results in equivalent generality and retains ease of programming. We motivate our case with, and experimentally study our barrier-less techniques in, a wide variety of MapReduce applications divided into seven classes. Our experiments show that our approach can achieve better job completion times than a traditional MapReduce framework. This is due primarily to the interleaving of I/O and computation, and forgoing disk-intensive work. We achieve a reduction in job completion times that is 25{\%} on average and 87{\%} in the best case.",
keywords = "Data-intensive computing, MapReduce",
author = "Abhishek Verma and Brian Cho and Nicolas Zea and Indranil Gupta and Campbell, {Roy H.}",
year = "2013",
month = "1",
day = "1",
doi = "10.1007/s10586-011-0182-7",
language = "English (US)",
volume = "16",
pages = "191--206",
journal = "Cluster Computing",
issn = "1386-7857",
publisher = "Kluwer Academic Publishers",
number = "1",

}

TY - JOUR

T1 - Breaking the MapReduce stage barrier

AU - Verma, Abhishek

AU - Cho, Brian

AU - Zea, Nicolas

AU - Gupta, Indranil

AU - Campbell, Roy H.

PY - 2013/1/1

Y1 - 2013/1/1

N2 - The MapReduce model uses a barrier between the Map and Reduce stages. This provides simplicity in both programming and implementation. However, in many situations, this barrier hurts performance because it is overly restrictive. Hence, we develop a method to break the barrier in MapReduce in a way that improves efficiency. Careful design of our barrier-less MapReduce framework results in equivalent generality and retains ease of programming. We motivate our case with, and experimentally study our barrier-less techniques in, a wide variety of MapReduce applications divided into seven classes. Our experiments show that our approach can achieve better job completion times than a traditional MapReduce framework. This is due primarily to the interleaving of I/O and computation, and forgoing disk-intensive work. We achieve a reduction in job completion times that is 25% on average and 87% in the best case.

AB - The MapReduce model uses a barrier between the Map and Reduce stages. This provides simplicity in both programming and implementation. However, in many situations, this barrier hurts performance because it is overly restrictive. Hence, we develop a method to break the barrier in MapReduce in a way that improves efficiency. Careful design of our barrier-less MapReduce framework results in equivalent generality and retains ease of programming. We motivate our case with, and experimentally study our barrier-less techniques in, a wide variety of MapReduce applications divided into seven classes. Our experiments show that our approach can achieve better job completion times than a traditional MapReduce framework. This is due primarily to the interleaving of I/O and computation, and forgoing disk-intensive work. We achieve a reduction in job completion times that is 25% on average and 87% in the best case.

KW - Data-intensive computing

KW - MapReduce

UR - http://www.scopus.com/inward/record.url?scp=84874793370&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84874793370&partnerID=8YFLogxK

U2 - 10.1007/s10586-011-0182-7

DO - 10.1007/s10586-011-0182-7

M3 - Article

AN - SCOPUS:84874793370

VL - 16

SP - 191

EP - 206

JO - Cluster Computing

JF - Cluster Computing

SN - 1386-7857

IS - 1

ER -