On the ambiguity and relevance of place names in scientific text

Xiaoliang Jiang, Vetle I. Torvik

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

How hard is it to systematically identify and disambiguate place names in scientific text? In order to address this question, we applied MapAffil, a toponymic search interface, on a random sample of 500 place name sentences from PubMed abstracts. The algorithm correctly identified and disambiguated 39.2% of the place names in sentences. An error analysis revealed six unique challenges: Biological terms (14.2%), Method terms (11.6%), Acronyms (10%), References (6%), Other entity names (4.2%), and Other errors (2.2%). Interestingly, a large portion of the correctly identified place names appeared irrelevant to the subject matter. Many of these errors can be fixed easily, but irrelevance is much harder to address, for it depends on semantics and purpose. To study the role of place in scientific text, it is not sufficient to disambiguate accurately, but it is also necessary to be able to assess the degree of relevance.

Original languageEnglish (US)
Title of host publicationJCDL 2020 - Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages401-404
Number of pages4
ISBN (Electronic)9781450375856
DOIs
StatePublished - Aug 1 2020
Event2020 ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL 2020 - Virtual, Online, China
Duration: Aug 1 2020Aug 5 2020

Publication series

NameProceedings of the ACM/IEEE Joint Conference on Digital Libraries
ISSN (Print)1552-5996

Conference

Conference2020 ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL 2020
Country/TerritoryChina
CityVirtual, Online
Period8/1/208/5/20

Keywords

  • Geoparsing
  • Named entity recognition
  • Place name ambiguity
  • Pubmed
  • Toponym resolution

ASJC Scopus subject areas

  • General Engineering

Cite this