Kurator: A Kepler package for data curation workflows

  • L. Dou
  • , G. Cao
  • , P. J. Morris
  • , R. A. Morris
  • , B. Ludäscher
  • , J. A. Macklin
  • , J. Hanken

Research output: Contribution to journalConference articlepeer-review

Abstract

Data curation is critical for scientific data digitization, sharing, integration, and use. This paper presents Kurator, a software package for automating data curation pipelines in the Kepler scientific workflow system. Several curation tools and services are integrated into this package as actors to enable construction of workflows to perform and document various data curation tasks. The integration of Google cloud services (e.g., Google spreadsheets), allows workflow steps to invoke human experts outside the workflow in a manner that greatly simplifies the complex data handling in distributed, multi-user curation workflows. The Kepler platform provides the modeling, execution and management ability, including a collection-oriented model of computation (COMAD), and provenance tracking and browsing for the curation package. These features not only allow workflows to be easily modeled, maintained, and evolved, but also QA/QC of curation results is facilitated through examination of provenance information recorded during workflow execution. Effectiveness of the Kurator package is demonstrated through a workflow for data curation of natural science collections.

Original languageEnglish (US)
Pages (from-to)1614-1619
Number of pages6
JournalProcedia Computer Science
Volume9
DOIs
StatePublished - 2012
Externally publishedYes
Event12th Annual International Conference on Computational Science, ICCS 2012 - Omaha, NB, United States
Duration: Jun 4 2012Jun 6 2012

Keywords

  • Biodiversity informatics
  • Data curation
  • Scientific workflows

ASJC Scopus subject areas

  • General Computer Science

Fingerprint

Dive into the research topics of 'Kurator: A Kepler package for data curation workflows'. Together they form a unique fingerprint.

Cite this