Evaluating storage systems for scientific data in the cloud

Ketan Maheshwari, Justin M. Wozniak, Hao Yang, Daniel S Katz, Matei Ripeanu, Victor Zavala, Michael Wilde

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Infrastructure-as-a-Service (IaaS) clouds are an appealing resource for scientific computing. However, the bare-bones presentation of raw Linux virtual machines leaves much to the application developer. For many cloud applications, effective data handling is critical to efficient application execution. This paper investigates the capabilities of a variety of POSIX-accessible distributed storage systems to manage data access patterns resulting from workflow application executions in the cloud. We leverage the expressivity of the Swift parallel scripting framework to benchmark the performance of a number of storage systems using synthetic workloads and three real-world applications. We characterize two representative commercial storage systems (Amazon S3 and HDFS, respectively) and two emerging research-based storage systems (Chirp/Parrot and MosaStore). We find the use of aggregated node-local resources effective and economical compared with remotely located S3 storage. Our experiments show that applications run at scale with MosaStore show up to 30% improvement in makespan time compared with those run with S3. We also find that storage-system driven application deployments in the cloud results in better runtime performance compared with an on-demand datastaging driven approach.

Original languageEnglish (US)
Title of host publicationScienceCloud 2014 - Proceedings of the 2014 ACM International Workshop on Scientific Cloud Computing, Co-located with HPDC 2014
PublisherAssociation for Computing Machinery
Pages33-40
Number of pages8
ISBN (Print)9781450329118
DOIs
StatePublished - 2014
Externally publishedYes
Event5th ACM Workshop on Scientific Cloud Computing, ScienceCloud 2014 - Vancouver, BC, Canada
Duration: Jun 23 2014Jun 27 2014

Publication series

NameScienceCloud 2014 - Proceedings of the 2014 ACM International Workshop on Scientific Cloud Computing, Co-located with HPDC 2014

Other

Other5th ACM Workshop on Scientific Cloud Computing, ScienceCloud 2014
Country/TerritoryCanada
CityVancouver, BC
Period6/23/146/27/14

Keywords

  • Cloud
  • Distributed computing
  • Storage systems

ASJC Scopus subject areas

  • Software

Fingerprint

Dive into the research topics of 'Evaluating storage systems for scientific data in the cloud'. Together they form a unique fingerprint.

Cite this