On availability of intermediate data in cloud computations

Steven Y. Ko, Imranul Hoque, Brian Cho, Indranil Gupta

Research output: Contribution to conferencePaperpeer-review

Abstract

This paper takes a renewed look at the problem of managing intermediate data that is generated during dataflow computations (e.g., MapReduce, Pig, Dryad, etc.) within clouds. We discuss salient features of this intermediate data and outline requirements for a solution. Our experiments show that existing local writeremote read solutions, traditional distributed file systems (e.g., HDFS), and support from transport protocols (e.g., TCP-Nice) cannot guarantee both data availability and minimal interference, which are our key requirements. We present design ideas for a new intermediate data storage system.

Original languageEnglish (US)
StatePublished - 2009
Event12th Workshop on Hot Topics in Operating Systems, HotOS 2009 - Monte Verita, Switzerland
Duration: May 18 2009May 20 2009

Conference

Conference12th Workshop on Hot Topics in Operating Systems, HotOS 2009
Country/TerritorySwitzerland
CityMonte Verita
Period5/18/095/20/09

ASJC Scopus subject areas

  • Hardware and Architecture
  • Information Systems
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'On availability of intermediate data in cloud computations'. Together they form a unique fingerprint.

Cite this