IOSPReD: I/O Specialized Packaging of Reduced Datasets and Data-Intensive Applications for Efficient Reproducibility

Chaitra Niddodi, Ashish Gehani, Tanu Malik, Sibin Mohan, Michael Lee Rilee

Research output: Contribution to journalArticlepeer-review

Abstract

The data generated by large scale scientific systems such as NASA's Earth Observing System Data and Information System is expected to increase substantially. Consequently, applications processing these huge volumes of data suffer from lack of storage space at the execution site. This poses a critical challenge while sharing data and reproducing application executions w.r.t. specific user inputs in data-intensive applications. To address this issue, we propose IOSPReD (I/O Specialized Packaging of Reduced Datasets), a data-based debloating framework, designed to automatically track and package only necessary chunks of data (along with the application) in a container. IOSPReD uses the specific inputs provided by the user to identify the necessary data chunks. To do so, the high level user inputs are mapped down to low level data file offsets. We evaluate IOSPReD on different realistic NASA datasets to assess (i) the amount of data reduction, (ii) the reproducibility of results across multiple application executions and also (iii) the impact on performance.

Original languageEnglish (US)
Pages (from-to)1718-1731
Number of pages14
JournalIEEE Access
Volume11
DOIs
StatePublished - 2023
Externally publishedYes

Keywords

  • Data management
  • I/O specialization
  • containerization
  • data-based debloating
  • data-intensive applications
  • reproducibility

ASJC Scopus subject areas

  • General Computer Science
  • General Materials Science
  • General Engineering

Fingerprint

Dive into the research topics of 'IOSPReD: I/O Specialized Packaging of Reduced Datasets and Data-Intensive Applications for Efficient Reproducibility'. Together they form a unique fingerprint.

Cite this