TY - JOUR
T1 - EMPRESS
T2 - Accelerating Scientific Discovery through Descriptive Metadata Management
AU - Lawson, Margaret
AU - Gropp, William
AU - Lofstead, Jay
N1 - Publisher Copyright:
© 2022 Copyright held by the owner/author(s). Publication rights licensed to ACM.
PY - 2022/12/12
Y1 - 2022/12/12
N2 - High-performance computing scientists are producing unprecedented volumes of data that take a long time to load for analysis. However, many analyses only require loading in the data containing particular features of interest and scientists have many approaches for identifying these features. Therefore, if scientists store information (descriptive metadata) about these identified features, then for subsequent analyses they can use this information to only read in the data containing these features. This can greatly reduce the amount of data that scientists have to read in, thereby accelerating analysis. Despite the potential benefits of descriptive metadata management, no prior work has created a descriptive metadata system that can help scientists working with a wide range of applications and analyses to restrict their reads to data containing features of interest. In this article, we present EMPRESS, the first such solution. EMPRESS offers all of the features needed to help accelerate discovery: It can accelerate analysis by up to 300 ×, supports a wide range of applications and analyses, is high-performing, is highly scalable, and requires minimal storage space. In addition, EMPRESS offers features required for a production-oriented system: scalable metadata consistency techniques, flexible system configurations, fault tolerance as a service, and portability.
AB - High-performance computing scientists are producing unprecedented volumes of data that take a long time to load for analysis. However, many analyses only require loading in the data containing particular features of interest and scientists have many approaches for identifying these features. Therefore, if scientists store information (descriptive metadata) about these identified features, then for subsequent analyses they can use this information to only read in the data containing these features. This can greatly reduce the amount of data that scientists have to read in, thereby accelerating analysis. Despite the potential benefits of descriptive metadata management, no prior work has created a descriptive metadata system that can help scientists working with a wide range of applications and analyses to restrict their reads to data containing features of interest. In this article, we present EMPRESS, the first such solution. EMPRESS offers all of the features needed to help accelerate discovery: It can accelerate analysis by up to 300 ×, supports a wide range of applications and analyses, is high-performing, is highly scalable, and requires minimal storage space. In addition, EMPRESS offers features required for a production-oriented system: scalable metadata consistency techniques, flexible system configurations, fault tolerance as a service, and portability.
KW - ATDM
KW - Decaf
KW - Descriptive metadata
KW - EMPRESS
KW - HDF5
KW - accelerating scientific discovery
KW - data tagging
KW - high-level indexing
UR - http://www.scopus.com/inward/record.url?scp=85146426732&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85146426732&partnerID=8YFLogxK
U2 - 10.1145/3523698
DO - 10.1145/3523698
M3 - Article
AN - SCOPUS:85146426732
SN - 1553-3077
VL - 18
JO - ACM Transactions on Storage
JF - ACM Transactions on Storage
IS - 4
M1 - 3523698
ER -