TY - GEN
T1 - On the ambiguity and relevance of place names in scientific text
AU - Jiang, Xiaoliang
AU - Torvik, Vetle I.
N1 - Publisher Copyright:
© 2020. ACM ISBN.
PY - 2020/8/1
Y1 - 2020/8/1
N2 - How hard is it to systematically identify and disambiguate place names in scientific text? In order to address this question, we applied MapAffil, a toponymic search interface, on a random sample of 500 place name sentences from PubMed abstracts. The algorithm correctly identified and disambiguated 39.2% of the place names in sentences. An error analysis revealed six unique challenges: Biological terms (14.2%), Method terms (11.6%), Acronyms (10%), References (6%), Other entity names (4.2%), and Other errors (2.2%). Interestingly, a large portion of the correctly identified place names appeared irrelevant to the subject matter. Many of these errors can be fixed easily, but irrelevance is much harder to address, for it depends on semantics and purpose. To study the role of place in scientific text, it is not sufficient to disambiguate accurately, but it is also necessary to be able to assess the degree of relevance.
AB - How hard is it to systematically identify and disambiguate place names in scientific text? In order to address this question, we applied MapAffil, a toponymic search interface, on a random sample of 500 place name sentences from PubMed abstracts. The algorithm correctly identified and disambiguated 39.2% of the place names in sentences. An error analysis revealed six unique challenges: Biological terms (14.2%), Method terms (11.6%), Acronyms (10%), References (6%), Other entity names (4.2%), and Other errors (2.2%). Interestingly, a large portion of the correctly identified place names appeared irrelevant to the subject matter. Many of these errors can be fixed easily, but irrelevance is much harder to address, for it depends on semantics and purpose. To study the role of place in scientific text, it is not sufficient to disambiguate accurately, but it is also necessary to be able to assess the degree of relevance.
KW - Geoparsing
KW - Named entity recognition
KW - Place name ambiguity
KW - Pubmed
KW - Toponym resolution
UR - http://www.scopus.com/inward/record.url?scp=85095118904&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85095118904&partnerID=8YFLogxK
U2 - 10.1145/3383583.3398618
DO - 10.1145/3383583.3398618
M3 - Conference contribution
AN - SCOPUS:85095118904
T3 - Proceedings of the ACM/IEEE Joint Conference on Digital Libraries
SP - 401
EP - 404
BT - JCDL 2020 - Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2020 ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL 2020
Y2 - 1 August 2020 through 5 August 2020
ER -