MiDas: Containerizing Data-Intensive Applications with I/O Specialization

Chaitra Niddodi, Ashish Gehani, Tanu Malik, Jorge A. Navas, Sibin Mohan

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Scientific applications often depend on data produced from computational models. Model-generated data can be prohibitively large. Current mechanisms for sharing and distributing reproducible applications, such as containers, assume all model data is saved and included with a program to support its successful re-execution. However, including model data increases the sizes of containers. This increases the cost and time required for deployment and further reuse. We present a framework named MiDas ("Minimizing Datasets") for specializing I/O libraries which, given an application, automates the process of identifying and including only a subset of the data accessed by the program. To do this, MiDas combines static and dynamic analysis techniques to map high level user inputs to low level file offsets. We show several orders of magnitude reduction in data size via specialization of I/O libraries associated with model-based data-intensive applications, such as those operating on meteorological and geophysical data.

Original languageEnglish (US)
Title of host publicationP-RECS 2020 - Proceedings of the 3rd International Workshop on Practical Reproducible Evaluation of Computer Systems
PublisherAssociation for Computing Machinery
Pages21-26
Number of pages6
ISBN (Electronic)9781450379779
DOIs
StatePublished - Jun 23 2020
Event3rd International Workshop on Practical Reproducible Evaluation of Computer Systems, P-RECS 2020 - Stockholm, Sweden
Duration: Jun 23 2020 → …

Publication series

NameP-RECS 2020 - Proceedings of the 3rd International Workshop on Practical Reproducible Evaluation of Computer Systems

Conference

Conference3rd International Workshop on Practical Reproducible Evaluation of Computer Systems, P-RECS 2020
Country/TerritorySweden
CityStockholm
Period6/23/20 → …

Keywords

  • I/O specialization
  • containers
  • data-intensive

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Software

Fingerprint

Dive into the research topics of 'MiDas: Containerizing Data-Intensive Applications with I/O Specialization'. Together they form a unique fingerprint.

Cite this