Making cloud intermediate data fault-tolerant

Steven Y. Ko, Imranul Hoque, Brian Cho, Indranil Gupta

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Parallel dataflow programs generate enormous amounts of distributed data that are short-lived, yet are critical for completion of the job and for good run-time performance. We call this class of data as intermediate data. This paper is the first to address intermediate data as a first-class citizen, specifically targeting and minimizing the effect of run-time server failures on the availability of intermediate data, and thus on performance metrics such as job completion time. We propose new design techniques for a new storage system called ISS (Intermediate Storage System), implement these techniques within Hadoop, and experimentally evaluate the resulting system. Under no failure, the performance of Hadoop augmented with ISS (i.e., job completion time) turns out to be comparable to base Hadoop. Under a failure, Hadoop with ISS outperforms base Hadoop and incurs up to 18% overhead compared to base no-failure Hadoop, depending on the testbed setup.

Original languageEnglish (US)
Title of host publicationProceedings of the 1st ACM Symposium on Cloud Computing, SoCC '10
Pages181-192
Number of pages12
DOIs
StatePublished - Jul 30 2010
Event1st ACM Symposium on Cloud Computing, SoCC '10 - Indianapolis, IN, United States
Duration: Jun 6 2010Jun 11 2010

Publication series

NameProceedings of the 1st ACM Symposium on Cloud Computing, SoCC '10

Other

Other1st ACM Symposium on Cloud Computing, SoCC '10
CountryUnited States
CityIndianapolis, IN
Period6/6/106/11/10

Fingerprint

Testbeds
Servers
Availability

Keywords

  • Interference minimization
  • Intermediate data
  • MapReduce
  • Replication

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Science Applications

Cite this

Ko, S. Y., Hoque, I., Cho, B., & Gupta, I. (2010). Making cloud intermediate data fault-tolerant. In Proceedings of the 1st ACM Symposium on Cloud Computing, SoCC '10 (pp. 181-192). (Proceedings of the 1st ACM Symposium on Cloud Computing, SoCC '10). https://doi.org/10.1145/1807128.1807160

Making cloud intermediate data fault-tolerant. / Ko, Steven Y.; Hoque, Imranul; Cho, Brian; Gupta, Indranil.

Proceedings of the 1st ACM Symposium on Cloud Computing, SoCC '10. 2010. p. 181-192 (Proceedings of the 1st ACM Symposium on Cloud Computing, SoCC '10).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Ko, SY, Hoque, I, Cho, B & Gupta, I 2010, Making cloud intermediate data fault-tolerant. in Proceedings of the 1st ACM Symposium on Cloud Computing, SoCC '10. Proceedings of the 1st ACM Symposium on Cloud Computing, SoCC '10, pp. 181-192, 1st ACM Symposium on Cloud Computing, SoCC '10, Indianapolis, IN, United States, 6/6/10. https://doi.org/10.1145/1807128.1807160
Ko SY, Hoque I, Cho B, Gupta I. Making cloud intermediate data fault-tolerant. In Proceedings of the 1st ACM Symposium on Cloud Computing, SoCC '10. 2010. p. 181-192. (Proceedings of the 1st ACM Symposium on Cloud Computing, SoCC '10). https://doi.org/10.1145/1807128.1807160
Ko, Steven Y. ; Hoque, Imranul ; Cho, Brian ; Gupta, Indranil. / Making cloud intermediate data fault-tolerant. Proceedings of the 1st ACM Symposium on Cloud Computing, SoCC '10. 2010. pp. 181-192 (Proceedings of the 1st ACM Symposium on Cloud Computing, SoCC '10).
@inproceedings{c16082f5e7ce49a98a349f4e63b67850,
title = "Making cloud intermediate data fault-tolerant",
abstract = "Parallel dataflow programs generate enormous amounts of distributed data that are short-lived, yet are critical for completion of the job and for good run-time performance. We call this class of data as intermediate data. This paper is the first to address intermediate data as a first-class citizen, specifically targeting and minimizing the effect of run-time server failures on the availability of intermediate data, and thus on performance metrics such as job completion time. We propose new design techniques for a new storage system called ISS (Intermediate Storage System), implement these techniques within Hadoop, and experimentally evaluate the resulting system. Under no failure, the performance of Hadoop augmented with ISS (i.e., job completion time) turns out to be comparable to base Hadoop. Under a failure, Hadoop with ISS outperforms base Hadoop and incurs up to 18{\%} overhead compared to base no-failure Hadoop, depending on the testbed setup.",
keywords = "Interference minimization, Intermediate data, MapReduce, Replication",
author = "Ko, {Steven Y.} and Imranul Hoque and Brian Cho and Indranil Gupta",
year = "2010",
month = "7",
day = "30",
doi = "10.1145/1807128.1807160",
language = "English (US)",
isbn = "9781450300346",
series = "Proceedings of the 1st ACM Symposium on Cloud Computing, SoCC '10",
pages = "181--192",
booktitle = "Proceedings of the 1st ACM Symposium on Cloud Computing, SoCC '10",

}

TY - GEN

T1 - Making cloud intermediate data fault-tolerant

AU - Ko, Steven Y.

AU - Hoque, Imranul

AU - Cho, Brian

AU - Gupta, Indranil

PY - 2010/7/30

Y1 - 2010/7/30

N2 - Parallel dataflow programs generate enormous amounts of distributed data that are short-lived, yet are critical for completion of the job and for good run-time performance. We call this class of data as intermediate data. This paper is the first to address intermediate data as a first-class citizen, specifically targeting and minimizing the effect of run-time server failures on the availability of intermediate data, and thus on performance metrics such as job completion time. We propose new design techniques for a new storage system called ISS (Intermediate Storage System), implement these techniques within Hadoop, and experimentally evaluate the resulting system. Under no failure, the performance of Hadoop augmented with ISS (i.e., job completion time) turns out to be comparable to base Hadoop. Under a failure, Hadoop with ISS outperforms base Hadoop and incurs up to 18% overhead compared to base no-failure Hadoop, depending on the testbed setup.

AB - Parallel dataflow programs generate enormous amounts of distributed data that are short-lived, yet are critical for completion of the job and for good run-time performance. We call this class of data as intermediate data. This paper is the first to address intermediate data as a first-class citizen, specifically targeting and minimizing the effect of run-time server failures on the availability of intermediate data, and thus on performance metrics such as job completion time. We propose new design techniques for a new storage system called ISS (Intermediate Storage System), implement these techniques within Hadoop, and experimentally evaluate the resulting system. Under no failure, the performance of Hadoop augmented with ISS (i.e., job completion time) turns out to be comparable to base Hadoop. Under a failure, Hadoop with ISS outperforms base Hadoop and incurs up to 18% overhead compared to base no-failure Hadoop, depending on the testbed setup.

KW - Interference minimization

KW - Intermediate data

KW - MapReduce

KW - Replication

UR - http://www.scopus.com/inward/record.url?scp=77954901713&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77954901713&partnerID=8YFLogxK

U2 - 10.1145/1807128.1807160

DO - 10.1145/1807128.1807160

M3 - Conference contribution

AN - SCOPUS:77954901713

SN - 9781450300346

T3 - Proceedings of the 1st ACM Symposium on Cloud Computing, SoCC '10

SP - 181

EP - 192

BT - Proceedings of the 1st ACM Symposium on Cloud Computing, SoCC '10

ER -