Fine-Grained Named Entity Recognition with Distant Supervision in COVID-19 Literature

Xuan Wang, Xiangchen Song, Bangzheng Li, Kang Zhou, Qi Li, Jiawei Han

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Biomedical named entity recognition (BioNER) is a fundamental step for mining COVID-19 literature. Existing BioNER datasets cover a few common coarse-grained entity types (e.g., genes, chemicals, and diseases), which cannot be used to recognize highly domain-specific entity types (e.g., animal models of diseases) or emerging ones (e.g., coronaviruses) for COVID-19 studies. We present CORD-NER, a fine-grained named entity recognized dataset of COVID-19 literature (up until May 19, 2020). CORD-NER contains over 12 million sentences annotated via distant supervision. Also included in CORD-NER are 2,000 manually-curated sentences as a test set for performance evaluation. CORD-NER covers 75 fine-grained entity types. In addition to the common biomedical entity types, it covers new entity types specifically related to COVID-19 studies, such as coronaviruses, viral proteins, evolution, and immune responses. The dictionaries of these fine-grained entity types are collected from existing knowledge bases and human-input seed sets. We further present DISTNER, a distantly supervised NER model that relies on a massive unlabeled corpus and a collection of dictionaries to annotate the COVID-19 corpus. DISTNER provides a benchmark performance on the CORD-NER test set for future research.

Original languageEnglish (US)
Title of host publicationProceedings - 2020 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2020
EditorsTaesung Park, Young-Rae Cho, Xiaohua Tony Hu, Illhoi Yoo, Hyun Goo Woo, Jianxin Wang, Julio Facelli, Seungyoon Nam, Mingon Kang
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages491-494
Number of pages4
ISBN (Electronic)9781728162157
DOIs
StatePublished - Dec 16 2020
Event2020 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2020 - Virtual, Seoul, Korea, Republic of
Duration: Dec 16 2020Dec 19 2020

Publication series

NameProceedings - 2020 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2020

Conference

Conference2020 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2020
Country/TerritoryKorea, Republic of
CityVirtual, Seoul
Period12/16/2012/19/20

Keywords

  • COVID-19
  • distant supervision
  • fine-grained named entity recognition

ASJC Scopus subject areas

  • Computer Science Applications
  • Information Systems and Management
  • Medicine (miscellaneous)
  • Health Informatics

Fingerprint

Dive into the research topics of 'Fine-Grained Named Entity Recognition with Distant Supervision in COVID-19 Literature'. Together they form a unique fingerprint.

Cite this