Design of an observatory data system and open source data sharing with active curation and machine learning

Praveen Kumar, Luigi Marini, Michelle Pitcel, Laura Keefer, Kenton McHenry

Research output: Chapter in Book/Report/Conference proceedingConference contribution


Over the past six years IMLCZO (Intensively Managed Landscape Critical Zone Observatory) has contributed to benefited from the development and deployment of a system based on CLOWDER platform that supports heterogeneous scientific data. Scientific data is often very heterogeneous. Within geoscience, data spans time series, geospatial, remote sensing, geophysical image, geophysical and geochemical laboratory analyses, experimental outcomes, and images to name a few. For such data to be usable by others, large collections of data spanning these types, some of it unstructured, must be annotated and/or processed into more readily usable products. If datasets are large, which is more and more the case today, local computational capabilities are also often essential towards usability in order to save the user from having to download the data or identify a suitably powerful local computational resource to run analysis. Clowder, an open source data management framework built on the notion of Active Curation, provides machine learning and other analysis based tools to facilitate the annotation of large, broad, and unstructured datasets. Being customizable from the ground up, Clowder can be leveraged and deployed as needed at local institutions for specific scientific needs or deployed remotely on cloud/HPC resources, extended to meet new data visualization/analysis needs, and utilized to run custom analysis near the data where it resides, and interoperate with other data infrastructure components e.g. for long term archiving. Clowder has been leveraged to support the data sharing and processing needs of a broad range of communities spanning geoscience, biology, materials science, medicine, social science, cultural heritage and the arts. This presentation will describe the data system designed to support scientific advancement.
Original languageEnglish (US)
Title of host publicationAbstracts with Programs - Geological Society of America
Place of PublicationPhoenix, Arizona
PublisherGeological Society of America
StatePublished - 2019


  • ISWS


Dive into the research topics of 'Design of an observatory data system and open source data sharing with active curation and machine learning'. Together they form a unique fingerprint.

Cite this