TY - GEN
T1 - NSDF-Catalog
T2 - 15th IEEE/ACM International Conference on Utility and Cloud Computing, UCC 2022
AU - Luettgau, Jakob
AU - Scorzelli, Giorgio
AU - Pascucci, Valerio
AU - Tarcea, Glenn
AU - Kirkpatrick, Christine R.
AU - Taufer, Michela
N1 - ACKNOWLEDGMENT This research was supported by the National Science Foundation (NSF) under grant numbers #1841758, #2028923, #2103845 and #2138811; the Extreme Science and Engineering Discovery Environment (XSEDE) under allocation TG-CIS210128; Chameleon Cloud under allocation CHI-210923; and IBM through a Shared University Research Award.
PY - 2022
Y1 - 2022
N2 - Across domains massive amounts of scientific data are generated. Because of the large volume of information, data discoverability is a challenge, especially for scientists who have not generated the data or are from other domains. As part of the NSF-funded National Science Data Fabric (NSDF) initiative, we developed a testbed to demonstrate that these boundaries to data discoverability can be overcome. In support of this effort, we identify the need for indexing large-amounts of scientific data across scientific domains. We propose NSDF-Catalog, a lightweight indexing service with minimal metadata that complements existing domain-specific and rich-metadata col-lections. NSDF-Catalog is designed to facilitate multiple related objectives within a flexible microservice to: (i) coordinate data movements and replication of data from origin repositories within the NSDF federation; (ii) build an inventory of existing scientific data to inform the design of next-generation cyberinfrastructure; and (iii) provide a suite of tools for discovery of datasets for cross-disciplinary research. Our service indexes scientific data at a fine-granularity at the file or object level to inform data distribution strategies and to improve the experience for users from the consumer perspective, with the goal of allowing end-to-end dataflow optimizations.
AB - Across domains massive amounts of scientific data are generated. Because of the large volume of information, data discoverability is a challenge, especially for scientists who have not generated the data or are from other domains. As part of the NSF-funded National Science Data Fabric (NSDF) initiative, we developed a testbed to demonstrate that these boundaries to data discoverability can be overcome. In support of this effort, we identify the need for indexing large-amounts of scientific data across scientific domains. We propose NSDF-Catalog, a lightweight indexing service with minimal metadata that complements existing domain-specific and rich-metadata col-lections. NSDF-Catalog is designed to facilitate multiple related objectives within a flexible microservice to: (i) coordinate data movements and replication of data from origin repositories within the NSDF federation; (ii) build an inventory of existing scientific data to inform the design of next-generation cyberinfrastructure; and (iii) provide a suite of tools for discovery of datasets for cross-disciplinary research. Our service indexes scientific data at a fine-granularity at the file or object level to inform data distribution strategies and to improve the experience for users from the consumer perspective, with the goal of allowing end-to-end dataflow optimizations.
KW - cloud
KW - high performance computing
KW - national science data fabric
KW - scientific data
UR - http://www.scopus.com/inward/record.url?scp=85150676284&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85150676284&partnerID=8YFLogxK
U2 - 10.1109/UCC56403.2022.00011
DO - 10.1109/UCC56403.2022.00011
M3 - Conference contribution
AN - SCOPUS:85150676284
T3 - Proceedings - 2022 IEEE/ACM 15th International Conference on Utility and Cloud Computing, UCC 2022
SP - 1
EP - 10
BT - Proceedings - 2022 IEEE/ACM 15th International Conference on Utility and Cloud Computing, UCC 2022
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 6 December 2022 through 9 December 2022
ER -