TY - GEN
T1 - Making cloud intermediate data fault-tolerant
AU - Ko, Steven Y.
AU - Hoque, Imranul
AU - Cho, Brian
AU - Gupta, Indranil
PY - 2010
Y1 - 2010
N2 - Parallel dataflow programs generate enormous amounts of distributed data that are short-lived, yet are critical for completion of the job and for good run-time performance. We call this class of data as intermediate data. This paper is the first to address intermediate data as a first-class citizen, specifically targeting and minimizing the effect of run-time server failures on the availability of intermediate data, and thus on performance metrics such as job completion time. We propose new design techniques for a new storage system called ISS (Intermediate Storage System), implement these techniques within Hadoop, and experimentally evaluate the resulting system. Under no failure, the performance of Hadoop augmented with ISS (i.e., job completion time) turns out to be comparable to base Hadoop. Under a failure, Hadoop with ISS outperforms base Hadoop and incurs up to 18% overhead compared to base no-failure Hadoop, depending on the testbed setup.
AB - Parallel dataflow programs generate enormous amounts of distributed data that are short-lived, yet are critical for completion of the job and for good run-time performance. We call this class of data as intermediate data. This paper is the first to address intermediate data as a first-class citizen, specifically targeting and minimizing the effect of run-time server failures on the availability of intermediate data, and thus on performance metrics such as job completion time. We propose new design techniques for a new storage system called ISS (Intermediate Storage System), implement these techniques within Hadoop, and experimentally evaluate the resulting system. Under no failure, the performance of Hadoop augmented with ISS (i.e., job completion time) turns out to be comparable to base Hadoop. Under a failure, Hadoop with ISS outperforms base Hadoop and incurs up to 18% overhead compared to base no-failure Hadoop, depending on the testbed setup.
KW - Interference minimization
KW - Intermediate data
KW - MapReduce
KW - Replication
UR - http://www.scopus.com/inward/record.url?scp=77954901713&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77954901713&partnerID=8YFLogxK
U2 - 10.1145/1807128.1807160
DO - 10.1145/1807128.1807160
M3 - Conference contribution
AN - SCOPUS:77954901713
SN - 9781450300346
T3 - Proceedings of the 1st ACM Symposium on Cloud Computing, SoCC '10
SP - 181
EP - 192
BT - Proceedings of the 1st ACM Symposium on Cloud Computing, SoCC '10
T2 - 1st ACM Symposium on Cloud Computing, SoCC '10
Y2 - 6 June 2010 through 11 June 2010
ER -