Managing scientific data: From data integration to scientific workflows

Bertram Ludäscher, Kai Lin, Shawn Bowers, Efrat Jaeger-Frank, Boyan Brodaric, Chaitan Baru

Research output: Contribution to journalArticlepeer-review

Abstract

Scientists are confronted with significant data management problems due to the large volume and high complexity of scientific data. In particular, the latter makes data integration a difficult technical challenge. In this paper, we describe our work on semantic mediation and scientific workflows and discuss how these technologies address integration challenges in scientific data management. We first give an overview of the main data integration problems that arise from heterogeneity in the syntax, structure, and semantics of data. Starting from a traditional mediator approach, we show how semantic extensions can facilitate data integration in complex, multiple-world scenarios, where data sources cover different but related scientific domains. Such scenarios are not amenable to conventional schema integration approaches. The core idea of semantic mediation is to augment database mediators and query evaluation algorithms with appropriate knowledge representation techniques to exploit information from shared ontologies. Semantic mediation relies on semantic data registration, which associates existing data with semantic information from an ontology. The Kepler scientific workflow system addresses the problem of synthesizing, from existing tools and applications, reusable workflow components and analytical pipelines to automate scientific analyses. After presenting core features and example workflows in Kepler, we present a framework for adding semantic information to scientific workflows. The resulting system is aware of semantically plausible connections between workflow components as well as between data sources and workflow components. This information can be used by the scientist during workflow design, and by the workflow engineer, for creating data transformation steps between semantically compatible but structurally incompatible analytical steps.

Original languageEnglish (US)
Pages (from-to)109-129
Number of pages21
JournalSpecial Paper of the Geological Society of America
Volume397
DOIs
StatePublished - 2006
Externally publishedYes

Keywords

  • Data integration
  • Ontologies
  • Scientific data management
  • Scientific workflows

ASJC Scopus subject areas

  • Geology

Fingerprint

Dive into the research topics of 'Managing scientific data: From data integration to scientific workflows'. Together they form a unique fingerprint.

Cite this