NSDF-Catalog: Lightweight Indexing Service for Democratizing Data Delivery

Jakob Luettgau, Giorgio Scorzelli, Valerio Pascucci, Glenn Tarcea, Christine R. Kirkpatrick, Michela Taufer

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Across domains massive amounts of scientific data are generated. Because of the large volume of information, data discoverability is a challenge, especially for scientists who have not generated the data or are from other domains. As part of the NSF-funded National Science Data Fabric (NSDF) initiative, we developed a testbed to demonstrate that these boundaries to data discoverability can be overcome. In support of this effort, we identify the need for indexing large-amounts of scientific data across scientific domains. We propose NSDF-Catalog, a lightweight indexing service with minimal metadata that complements existing domain-specific and rich-metadata col-lections. NSDF-Catalog is designed to facilitate multiple related objectives within a flexible microservice to: (i) coordinate data movements and replication of data from origin repositories within the NSDF federation; (ii) build an inventory of existing scientific data to inform the design of next-generation cyberinfrastructure; and (iii) provide a suite of tools for discovery of datasets for cross-disciplinary research. Our service indexes scientific data at a fine-granularity at the file or object level to inform data distribution strategies and to improve the experience for users from the consumer perspective, with the goal of allowing end-to-end dataflow optimizations.

Original languageEnglish (US)
Title of host publicationProceedings - 2022 IEEE/ACM 15th International Conference on Utility and Cloud Computing, UCC 2022
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1-10
Number of pages10
ISBN (Electronic)9781665460873
DOIs
StatePublished - 2022
Externally publishedYes
Event15th IEEE/ACM International Conference on Utility and Cloud Computing, UCC 2022 - Vancouver, United States
Duration: Dec 6 2022Dec 9 2022

Publication series

NameProceedings - 2022 IEEE/ACM 15th International Conference on Utility and Cloud Computing, UCC 2022

Conference

Conference15th IEEE/ACM International Conference on Utility and Cloud Computing, UCC 2022
Country/TerritoryUnited States
CityVancouver
Period12/6/2212/9/22

Keywords

  • cloud
  • high performance computing
  • national science data fabric
  • scientific data

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Networks and Communications
  • Computer Science Applications
  • Information Systems
  • Information Systems and Management
  • Health Informatics

Fingerprint

Dive into the research topics of 'NSDF-Catalog: Lightweight Indexing Service for Democratizing Data Delivery'. Together they form a unique fingerprint.

Cite this