TY - GEN
T1 - Training a geographic entity recognizer on biomedical abstracts with the aid of embeddings, metadata, and linked data
AU - Jiang, Xiaoliang
AU - Bosch, Nigel
AU - Torvik, Vetle I.
N1 - Research reported in this publication was supported by the US National Institutes of Health (Award Number P01AG039347). The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.
PY - 2025/3/13
Y1 - 2025/3/13
N2 - Public access to scientific literature has fueled research in text mining and natural language processing, yet the problem of geographic named entity recognition persists. This paper describes a recognizer that uses candidates from multiple existing Named Entity Recognition (NER) tools to ensure high recall and uses a filtering model trained on sentence embeddings, metadata, and citation data to improve precision. Experimental results on a manually curated set of biomedical abstracts show that this filtering model preserves high recall while achieving much higher precision than all of the individual NER tools. This should enable more effective geography-based analysis of scientific literature, for example, to study the role of place in biomedical discovery.
AB - Public access to scientific literature has fueled research in text mining and natural language processing, yet the problem of geographic named entity recognition persists. This paper describes a recognizer that uses candidates from multiple existing Named Entity Recognition (NER) tools to ensure high recall and uses a filtering model trained on sentence embeddings, metadata, and citation data to improve precision. Experimental results on a manually curated set of biomedical abstracts show that this filtering model preserves high recall while achieving much higher precision than all of the individual NER tools. This should enable more effective geography-based analysis of scientific literature, for example, to study the role of place in biomedical discovery.
KW - Biomedical Text Mining
KW - Geographic Entity Recognition
KW - Geoparsing
KW - Information Extraction
KW - Linked Data
KW - Metadata
KW - Named Entity Recognition
KW - Natural Language Processing
KW - Scholarly Document Processing
KW - Sentence Embeddings
UR - http://www.scopus.com/inward/record.url?scp=105001132049&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=105001132049&partnerID=8YFLogxK
U2 - 10.1145/3677389.3702515
DO - 10.1145/3677389.3702515
M3 - Conference contribution
AN - SCOPUS:105001132049
T3 - Proceedings of the ACM/IEEE Joint Conference on Digital Libraries
BT - JCDL 2024 - Proceedings of the 24th ACM/IEEE Joint Conference on Digital Libraries
A2 - Wu, Jian
A2 - Hu, Xiao
A2 - Nurmikko-Fuller, Terhi
A2 - Chu, Sam
A2 - Yang, Ruixian
A2 - Downie, J. Stephen
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 24th ACM/IEEE Joint Conference on Digital Libraries, JCDL 2024
Y2 - 16 December 2024 through 20 December 2024
ER -