Cross-type biomedical named entity recognition with deep multi-task learning

Xuan Wang, Yu Zhang, Xiang Ren, Yuhao Zhang, Marinka Zitnik, Jingbo Shang, Curtis Langlotz, Jiawei Han

Research output: Contribution to journalArticle

Abstract

Motivation: State-of-the-art biomedical named entity recognition (BioNER) systems often require handcrafted features specific to each entity type, such as genes, chemicals and diseases. Although recent studies explored using neural network models for BioNER to free experts from manual feature engineering, the performance remains limited by the available training data for each entity type. Results: We propose a multi-task learning framework for BioNER to collectively use the training data of different types of entities and improve the performance on each of them. In experiments on 15 benchmark BioNER datasets, our multi-task model achieves substantially better performance compared with state-of-the-art BioNER systems and baseline neural sequence labeling models. Further analysis shows that the large performance gains come from sharing character- and wordlevel information among relevant biomedical entities across differently labeled corpora.

Original languageEnglish (US)
Pages (from-to)1745-1752
Number of pages8
JournalBioinformatics
Volume35
Issue number10
DOIs
StatePublished - May 15 2019

Fingerprint

Multi-task Learning
Named Entity Recognition
Learning
Labeling
Benchmarking
Genes
Neural Networks (Computer)
Neural networks
Task Model
Neural Network Model
Recognition (Psychology)
Baseline
Sharing
Experiments
Benchmark
Gene
Engineering
Experiment

ASJC Scopus subject areas

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Cite this

Cross-type biomedical named entity recognition with deep multi-task learning. / Wang, Xuan; Zhang, Yu; Ren, Xiang; Zhang, Yuhao; Zitnik, Marinka; Shang, Jingbo; Langlotz, Curtis; Han, Jiawei.

In: Bioinformatics, Vol. 35, No. 10, 15.05.2019, p. 1745-1752.

Research output: Contribution to journalArticle

Wang, X, Zhang, Y, Ren, X, Zhang, Y, Zitnik, M, Shang, J, Langlotz, C & Han, J 2019, 'Cross-type biomedical named entity recognition with deep multi-task learning', Bioinformatics, vol. 35, no. 10, pp. 1745-1752. https://doi.org/10.1093/bioinformatics/bty869
Wang X, Zhang Y, Ren X, Zhang Y, Zitnik M, Shang J et al. Cross-type biomedical named entity recognition with deep multi-task learning. Bioinformatics. 2019 May 15;35(10):1745-1752. https://doi.org/10.1093/bioinformatics/bty869
Wang, Xuan ; Zhang, Yu ; Ren, Xiang ; Zhang, Yuhao ; Zitnik, Marinka ; Shang, Jingbo ; Langlotz, Curtis ; Han, Jiawei. / Cross-type biomedical named entity recognition with deep multi-task learning. In: Bioinformatics. 2019 ; Vol. 35, No. 10. pp. 1745-1752.
@article{b594163afcf14b6ca6d03209212b8a65,
title = "Cross-type biomedical named entity recognition with deep multi-task learning",
abstract = "Motivation: State-of-the-art biomedical named entity recognition (BioNER) systems often require handcrafted features specific to each entity type, such as genes, chemicals and diseases. Although recent studies explored using neural network models for BioNER to free experts from manual feature engineering, the performance remains limited by the available training data for each entity type. Results: We propose a multi-task learning framework for BioNER to collectively use the training data of different types of entities and improve the performance on each of them. In experiments on 15 benchmark BioNER datasets, our multi-task model achieves substantially better performance compared with state-of-the-art BioNER systems and baseline neural sequence labeling models. Further analysis shows that the large performance gains come from sharing character- and wordlevel information among relevant biomedical entities across differently labeled corpora.",
author = "Xuan Wang and Yu Zhang and Xiang Ren and Yuhao Zhang and Marinka Zitnik and Jingbo Shang and Curtis Langlotz and Jiawei Han",
year = "2019",
month = "5",
day = "15",
doi = "10.1093/bioinformatics/bty869",
language = "English (US)",
volume = "35",
pages = "1745--1752",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "10",

}

TY - JOUR

T1 - Cross-type biomedical named entity recognition with deep multi-task learning

AU - Wang, Xuan

AU - Zhang, Yu

AU - Ren, Xiang

AU - Zhang, Yuhao

AU - Zitnik, Marinka

AU - Shang, Jingbo

AU - Langlotz, Curtis

AU - Han, Jiawei

PY - 2019/5/15

Y1 - 2019/5/15

N2 - Motivation: State-of-the-art biomedical named entity recognition (BioNER) systems often require handcrafted features specific to each entity type, such as genes, chemicals and diseases. Although recent studies explored using neural network models for BioNER to free experts from manual feature engineering, the performance remains limited by the available training data for each entity type. Results: We propose a multi-task learning framework for BioNER to collectively use the training data of different types of entities and improve the performance on each of them. In experiments on 15 benchmark BioNER datasets, our multi-task model achieves substantially better performance compared with state-of-the-art BioNER systems and baseline neural sequence labeling models. Further analysis shows that the large performance gains come from sharing character- and wordlevel information among relevant biomedical entities across differently labeled corpora.

AB - Motivation: State-of-the-art biomedical named entity recognition (BioNER) systems often require handcrafted features specific to each entity type, such as genes, chemicals and diseases. Although recent studies explored using neural network models for BioNER to free experts from manual feature engineering, the performance remains limited by the available training data for each entity type. Results: We propose a multi-task learning framework for BioNER to collectively use the training data of different types of entities and improve the performance on each of them. In experiments on 15 benchmark BioNER datasets, our multi-task model achieves substantially better performance compared with state-of-the-art BioNER systems and baseline neural sequence labeling models. Further analysis shows that the large performance gains come from sharing character- and wordlevel information among relevant biomedical entities across differently labeled corpora.

UR - http://www.scopus.com/inward/record.url?scp=85066061589&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85066061589&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/bty869

DO - 10.1093/bioinformatics/bty869

M3 - Article

C2 - 30307536

AN - SCOPUS:85066061589

VL - 35

SP - 1745

EP - 1752

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 10

ER -