Making cloud intermediate data fault-tolerant

Steven Y. Ko, Imranul Hoque, Brian Cho, Indranil Gupta

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Parallel dataflow programs generate enormous amounts of distributed data that are short-lived, yet are critical for completion of the job and for good run-time performance. We call this class of data as intermediate data. This paper is the first to address intermediate data as a first-class citizen, specifically targeting and minimizing the effect of run-time server failures on the availability of intermediate data, and thus on performance metrics such as job completion time. We propose new design techniques for a new storage system called ISS (Intermediate Storage System), implement these techniques within Hadoop, and experimentally evaluate the resulting system. Under no failure, the performance of Hadoop augmented with ISS (i.e., job completion time) turns out to be comparable to base Hadoop. Under a failure, Hadoop with ISS outperforms base Hadoop and incurs up to 18% overhead compared to base no-failure Hadoop, depending on the testbed setup.

Original languageEnglish (US)
Title of host publicationProceedings of the 1st ACM Symposium on Cloud Computing, SoCC '10
Pages181-192
Number of pages12
DOIs
StatePublished - 2010
Event1st ACM Symposium on Cloud Computing, SoCC '10 - Indianapolis, IN, United States
Duration: Jun 6 2010Jun 11 2010

Publication series

NameProceedings of the 1st ACM Symposium on Cloud Computing, SoCC '10

Other

Other1st ACM Symposium on Cloud Computing, SoCC '10
Country/TerritoryUnited States
CityIndianapolis, IN
Period6/6/106/11/10

Keywords

  • Interference minimization
  • Intermediate data
  • MapReduce
  • Replication

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Making cloud intermediate data fault-tolerant'. Together they form a unique fingerprint.

Cite this