TY - JOUR
T1 - Managing scientific data
T2 - From data integration to scientific workflows
AU - Ludäscher, Bertram
AU - Lin, Kai
AU - Bowers, Shawn
AU - Jaeger-Frank, Efrat
AU - Brodaric, Boyan
AU - Baru, Chaitan
PY - 2006
Y1 - 2006
N2 - Scientists are confronted with significant data management problems due to the large volume and high complexity of scientific data. In particular, the latter makes data integration a difficult technical challenge. In this paper, we describe our work on semantic mediation and scientific workflows and discuss how these technologies address integration challenges in scientific data management. We first give an overview of the main data integration problems that arise from heterogeneity in the syntax, structure, and semantics of data. Starting from a traditional mediator approach, we show how semantic extensions can facilitate data integration in complex, multiple-world scenarios, where data sources cover different but related scientific domains. Such scenarios are not amenable to conventional schema integration approaches. The core idea of semantic mediation is to augment database mediators and query evaluation algorithms with appropriate knowledge representation techniques to exploit information from shared ontologies. Semantic mediation relies on semantic data registration, which associates existing data with semantic information from an ontology. The Kepler scientific workflow system addresses the problem of synthesizing, from existing tools and applications, reusable workflow components and analytical pipelines to automate scientific analyses. After presenting core features and example workflows in Kepler, we present a framework for adding semantic information to scientific workflows. The resulting system is aware of semantically plausible connections between workflow components as well as between data sources and workflow components. This information can be used by the scientist during workflow design, and by the workflow engineer, for creating data transformation steps between semantically compatible but structurally incompatible analytical steps.
AB - Scientists are confronted with significant data management problems due to the large volume and high complexity of scientific data. In particular, the latter makes data integration a difficult technical challenge. In this paper, we describe our work on semantic mediation and scientific workflows and discuss how these technologies address integration challenges in scientific data management. We first give an overview of the main data integration problems that arise from heterogeneity in the syntax, structure, and semantics of data. Starting from a traditional mediator approach, we show how semantic extensions can facilitate data integration in complex, multiple-world scenarios, where data sources cover different but related scientific domains. Such scenarios are not amenable to conventional schema integration approaches. The core idea of semantic mediation is to augment database mediators and query evaluation algorithms with appropriate knowledge representation techniques to exploit information from shared ontologies. Semantic mediation relies on semantic data registration, which associates existing data with semantic information from an ontology. The Kepler scientific workflow system addresses the problem of synthesizing, from existing tools and applications, reusable workflow components and analytical pipelines to automate scientific analyses. After presenting core features and example workflows in Kepler, we present a framework for adding semantic information to scientific workflows. The resulting system is aware of semantically plausible connections between workflow components as well as between data sources and workflow components. This information can be used by the scientist during workflow design, and by the workflow engineer, for creating data transformation steps between semantically compatible but structurally incompatible analytical steps.
KW - Data integration
KW - Ontologies
KW - Scientific data management
KW - Scientific workflows
UR - http://www.scopus.com/inward/record.url?scp=73849084670&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=73849084670&partnerID=8YFLogxK
U2 - 10.1130/2006.2397(08)
DO - 10.1130/2006.2397(08)
M3 - Article
AN - SCOPUS:73849084670
SN - 0072-1077
VL - 397
SP - 109
EP - 129
JO - Special Paper of the Geological Society of America
JF - Special Paper of the Geological Society of America
ER -