EMPRESS: Accelerating Scientific Discovery through Descriptive Metadata Management

Margaret Lawson, William Gropp, Jay Lofstead

Research output: Contribution to journalArticlepeer-review

Abstract

High-performance computing scientists are producing unprecedented volumes of data that take a long time to load for analysis. However, many analyses only require loading in the data containing particular features of interest and scientists have many approaches for identifying these features. Therefore, if scientists store information (descriptive metadata) about these identified features, then for subsequent analyses they can use this information to only read in the data containing these features. This can greatly reduce the amount of data that scientists have to read in, thereby accelerating analysis. Despite the potential benefits of descriptive metadata management, no prior work has created a descriptive metadata system that can help scientists working with a wide range of applications and analyses to restrict their reads to data containing features of interest. In this article, we present EMPRESS, the first such solution. EMPRESS offers all of the features needed to help accelerate discovery: It can accelerate analysis by up to 300 ×, supports a wide range of applications and analyses, is high-performing, is highly scalable, and requires minimal storage space. In addition, EMPRESS offers features required for a production-oriented system: scalable metadata consistency techniques, flexible system configurations, fault tolerance as a service, and portability.

Original languageEnglish (US)
Article number3523698
JournalACM Transactions on Storage
Volume18
Issue number4
DOIs
StatePublished - Dec 12 2022

Keywords

  • ATDM
  • Decaf
  • Descriptive metadata
  • EMPRESS
  • HDF5
  • accelerating scientific discovery
  • data tagging
  • high-level indexing

ASJC Scopus subject areas

  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'EMPRESS: Accelerating Scientific Discovery through Descriptive Metadata Management'. Together they form a unique fingerprint.

Cite this