INTEGRATE-KG: A Workflow For Unifying Heterogeneous Data Driven by Shared Languages

Nahed Abu Zaid, Kara Schatz, Kimberly Bourne, Darrell Harry, Christine Hendren, Anna Maria Marshall, Khara Grieger, Jacob Jones, Alexey V. Gulyuk, Yaroslava G. Yingling, Rada Chirkova

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In large-scale multidisciplinary consortia endeavors that address problems of research, industry, and public-good significance, it is typically a priority to integrate the heterogeneous data contributed by the consortia participants into a unified data representation. Knowledge graphs (KGs) are a typical choice for the data model of the resulting data repositories. To overcome potential issues with terminology misalignment, consortia commonly dedicate resources to the development of shared languages (vocabularies), with the intent of enabling diverse participants to understand and build on each other's work. Our research focus in this paper is on the challenge of automating integration into unified KGs of diverse data that potentially use different terminology, with the help of the available shared languages to resolve terminology clashes.To address the challenge, we introduce a data-integration workflow called INTEGRATE-KG that is domain agnostic, yet domain aware through opportunities for the involvement of humans-in-the-loop. A key feature of the approach is in its use of the synonyms available for the shared languages to automate semantics-level terminology alignment across the individual data contributions after they have been submitted for integration. INTEGRATE-KG also includes a module for automatically enriching the available shared languages, with opportunities for domain experts to provide semantic corrections and feedback. We present the workflow, report on our experiences with applying it to experimental, survey, and shared-language data on phosphorus sustainability, and provide suggestions for involving domain experts in INTEGRATE-KG as humans-in-the-loop.

Original languageEnglish (US)
Title of host publicationProceedings - 2024 IEEE International Conference on Big Data, BigData 2024
EditorsWei Ding, Chang-Tien Lu, Fusheng Wang, Liping Di, Kesheng Wu, Jun Huan, Raghu Nambiar, Jundong Li, Filip Ilievski, Ricardo Baeza-Yates, Xiaohua Hu
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages3522-3531
Number of pages10
ISBN (Electronic)9798350362480
DOIs
StatePublished - 2024
Event2024 IEEE International Conference on Big Data, BigData 2024 - Washington, United States
Duration: Dec 15 2024Dec 18 2024

Publication series

NameProceedings - 2024 IEEE International Conference on Big Data, BigData 2024

Conference

Conference2024 IEEE International Conference on Big Data, BigData 2024
Country/TerritoryUnited States
CityWashington
Period12/15/2412/18/24

Keywords

  • Knowledge graphs for big scientific and experimental data
  • knowledge-graph applications
  • knowledge-graph construction

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Networks and Communications
  • Computer Science Applications
  • Information Systems
  • Information Systems and Management
  • Safety, Risk, Reliability and Quality
  • Modeling and Simulation

Fingerprint

Dive into the research topics of 'INTEGRATE-KG: A Workflow For Unifying Heterogeneous Data Driven by Shared Languages'. Together they form a unique fingerprint.

Cite this