TY - GEN
T1 - INTEGRATE-KG
T2 - 2024 IEEE International Conference on Big Data, BigData 2024
AU - Zaid, Nahed Abu
AU - Schatz, Kara
AU - Bourne, Kimberly
AU - Harry, Darrell
AU - Hendren, Christine
AU - Marshall, Anna Maria
AU - Grieger, Khara
AU - Jones, Jacob
AU - Gulyuk, Alexey V.
AU - Yingling, Yaroslava G.
AU - Chirkova, Rada
N1 - This research has been supported by the Science and Technologies for Phosphorus Sustainability (STEPS) Center under National Science Foundation Grant No. CBET-2019435.
PY - 2024
Y1 - 2024
N2 - In large-scale multidisciplinary consortia endeavors that address problems of research, industry, and public-good significance, it is typically a priority to integrate the heterogeneous data contributed by the consortia participants into a unified data representation. Knowledge graphs (KGs) are a typical choice for the data model of the resulting data repositories. To overcome potential issues with terminology misalignment, consortia commonly dedicate resources to the development of shared languages (vocabularies), with the intent of enabling diverse participants to understand and build on each other's work. Our research focus in this paper is on the challenge of automating integration into unified KGs of diverse data that potentially use different terminology, with the help of the available shared languages to resolve terminology clashes.To address the challenge, we introduce a data-integration workflow called INTEGRATE-KG that is domain agnostic, yet domain aware through opportunities for the involvement of humans-in-the-loop. A key feature of the approach is in its use of the synonyms available for the shared languages to automate semantics-level terminology alignment across the individual data contributions after they have been submitted for integration. INTEGRATE-KG also includes a module for automatically enriching the available shared languages, with opportunities for domain experts to provide semantic corrections and feedback. We present the workflow, report on our experiences with applying it to experimental, survey, and shared-language data on phosphorus sustainability, and provide suggestions for involving domain experts in INTEGRATE-KG as humans-in-the-loop.
AB - In large-scale multidisciplinary consortia endeavors that address problems of research, industry, and public-good significance, it is typically a priority to integrate the heterogeneous data contributed by the consortia participants into a unified data representation. Knowledge graphs (KGs) are a typical choice for the data model of the resulting data repositories. To overcome potential issues with terminology misalignment, consortia commonly dedicate resources to the development of shared languages (vocabularies), with the intent of enabling diverse participants to understand and build on each other's work. Our research focus in this paper is on the challenge of automating integration into unified KGs of diverse data that potentially use different terminology, with the help of the available shared languages to resolve terminology clashes.To address the challenge, we introduce a data-integration workflow called INTEGRATE-KG that is domain agnostic, yet domain aware through opportunities for the involvement of humans-in-the-loop. A key feature of the approach is in its use of the synonyms available for the shared languages to automate semantics-level terminology alignment across the individual data contributions after they have been submitted for integration. INTEGRATE-KG also includes a module for automatically enriching the available shared languages, with opportunities for domain experts to provide semantic corrections and feedback. We present the workflow, report on our experiences with applying it to experimental, survey, and shared-language data on phosphorus sustainability, and provide suggestions for involving domain experts in INTEGRATE-KG as humans-in-the-loop.
KW - Knowledge graphs for big scientific and experimental data
KW - knowledge-graph applications
KW - knowledge-graph construction
UR - http://www.scopus.com/inward/record.url?scp=85218004287&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85218004287&partnerID=8YFLogxK
U2 - 10.1109/BigData62323.2024.10825736
DO - 10.1109/BigData62323.2024.10825736
M3 - Conference contribution
AN - SCOPUS:85218004287
T3 - Proceedings - 2024 IEEE International Conference on Big Data, BigData 2024
SP - 3522
EP - 3531
BT - Proceedings - 2024 IEEE International Conference on Big Data, BigData 2024
A2 - Ding, Wei
A2 - Lu, Chang-Tien
A2 - Wang, Fusheng
A2 - Di, Liping
A2 - Wu, Kesheng
A2 - Huan, Jun
A2 - Nambiar, Raghu
A2 - Li, Jundong
A2 - Ilievski, Filip
A2 - Baeza-Yates, Ricardo
A2 - Hu, Xiaohua
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 15 December 2024 through 18 December 2024
ER -