TY - GEN
T1 - Clowder
T2 - 2018 Practice and Experience in Advanced Research Computing Conference: Seamless Creativity, PEARC 2018
AU - Marini, Luigi
AU - Satheesan, Sandeep Puthanveetil
AU - Nicholson, Todd
AU - Gutierrez-Polo, Indira
AU - Burnette, Maxwell
AU - Zhao, Yan
AU - Kooper, Rob
AU - Lee, Jong
AU - McHenry, Kenton
N1 - Publisher Copyright:
© 2018 Copyright held by the owner/author(s). Publication rights licensed to the Association for Computing Machinery.
PY - 2018/7/22
Y1 - 2018/7/22
N2 - Clowder is an open source data management system to support data curation of long tail data and metadata across multiple research domains and diverse data types. Institutions and labs can install and customize their own instance of the framework on local hardware or on remote cloud computing resources to provide a shared service to distributed communities of researchers. Data can be ingested directly from instruments or manually uploaded by users and then shared with remote collaborators using a web front end. We discuss some of the challenges encountered in designing and developing a system that can be easily adapted to different scientific areas including digital preservation, geoscience, material science, medicine, social science, cultural heritage and the arts. Some of these challenges include support for large amounts of data, horizontal scaling of domain specific preprocessing algorithms, ability to provide new data visualizations in the web browser, a comprehensive Web service API for automatic data ingestion and curation, a suite of social annotation and metadata management features to support data annotation by communities of users and algorithms, and a web based front-end to interact with code running on heterogeneous clusters, including HPC resources.
AB - Clowder is an open source data management system to support data curation of long tail data and metadata across multiple research domains and diverse data types. Institutions and labs can install and customize their own instance of the framework on local hardware or on remote cloud computing resources to provide a shared service to distributed communities of researchers. Data can be ingested directly from instruments or manually uploaded by users and then shared with remote collaborators using a web front end. We discuss some of the challenges encountered in designing and developing a system that can be easily adapted to different scientific areas including digital preservation, geoscience, material science, medicine, social science, cultural heritage and the arts. Some of these challenges include support for large amounts of data, horizontal scaling of domain specific preprocessing algorithms, ability to provide new data visualizations in the web browser, a comprehensive Web service API for automatic data ingestion and curation, a suite of social annotation and metadata management features to support data annotation by communities of users and algorithms, and a web based front-end to interact with code running on heterogeneous clusters, including HPC resources.
KW - Data curation
KW - Data management
KW - Linked data
KW - Metadata management
KW - Scientific gateways
UR - http://www.scopus.com/inward/record.url?scp=85051432133&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85051432133&partnerID=8YFLogxK
U2 - 10.1145/3219104.3219159
DO - 10.1145/3219104.3219159
M3 - Conference contribution
AN - SCOPUS:85051432133
SN - 9781450364461
T3 - ACM International Conference Proceeding Series
BT - Practice and Experience in Advanced Research Computing 2018
PB - Association for Computing Machinery
Y2 - 22 July 2017 through 26 July 2017
ER -