Improving Scholarly Knowledge Representation: Evaluating BERT-Based Models for Scientific Relation Classification

Ming Jiang, Jennifer D’Souza, Sören Auer, J. Stephen Downie

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

With the rapid growth of research publications, there is a vast amount of scholarly knowledge that needs to be organized in digital libraries. To deal with this challenge, techniques relying on knowledge-graph structures are being advocated. Within such graph-based pipelines, inferring relation types between related scientific concepts is a crucial step. Recently, advanced techniques relying on language models pre-trained on large corpora have been popularly explored for automatic relation classification. Despite the remarkable contributions that have been made, many of these methods were evaluated under different scenarios, which limits their comparability. To address this shortcoming, we present a thorough empirical evaluation of eight Bert-based classification models by focusing on two key factors: 1) Bert model variants, and 2) classification strategies. Experiments on three corpora show that domain-specific pre-training corpus benefits the Bert-based classification model to identify the type of scientific relations. Although the strategy of predicting a single relation each time achieves a higher classification accuracy than the strategy of identifying multiple relation types simultaneously in general, the latter strategy demonstrates a more consistent performance in the corpus with either a large or small number of annotations. Our study aims to offer recommendations to the stakeholders of digital libraries for selecting the appropriate technique to build knowledge-graph-based systems for enhanced scholarly information organization.

Original languageEnglish (US)
Title of host publicationDigital Libraries at Times of Massive Societal Transition - 22nd International Conference on Asia-Pacific Digital Libraries, ICADL 2020, Proceedings
EditorsEmi Ishita, Natalie Lee Pang, Lihong Zhou
PublisherSpringer
Pages3-19
Number of pages17
ISBN (Print)9783030644512
DOIs
StatePublished - 2020
Event22nd International Conference on Asia-Pacific Digital Libraries, ICADL 2020 - Kyoto, Japan
Duration: Nov 30 2020Dec 1 2020

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12504 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference22nd International Conference on Asia-Pacific Digital Libraries, ICADL 2020
Country/TerritoryJapan
CityKyoto
Period11/30/2012/1/20

Keywords

  • Digital library
  • Information extraction
  • Knowledge graphs
  • Neural machine learning
  • Scholarly text mining
  • Semantic relation classification

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Improving Scholarly Knowledge Representation: Evaluating BERT-Based Models for Scientific Relation Classification'. Together they form a unique fingerprint.

Cite this