Skip to main navigation Skip to search Skip to main content

Embracing Non-Traditional Linguistic Resources for Low-resource Language Name Tagging

  • Boliang Zhang
  • , Di Lu
  • , Xiaoman Pan
  • , Ying Lin
  • , Halidanmu Abudukelimu
  • , Heng Ji
  • , Kevin Knight

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Current supervised name tagging approaches are inadequate for most low-resource languages due to the lack of annotated data and actionable linguistic knowledge. All supervised learning methods (including deep neural networks (DNN)) are sensitive to noise and thus they are not quite portable without massive clean annotations. We found that the F-scores of DNN-based name taggers drop rapidly (20%-30%) when we replace clean manual annotations with noisy annotations in the training data. We propose a new solution to incorporate many non-traditional language universal resources that are readily available but rarely explored in the Natural Language Processing (NLP) community, such as the World Atlas of Linguistic Structure, CIA names, PanLex and survival guides. We acquire and encode various types of non-traditional linguistic resources into a DNN name tagger. Experiments on three low-resource languages show that feeding linguistic knowledge can make DNN significantly more robust to noise, achieving 8%-22% absolute F-score gains on name tagging without using any human annotation.

Original languageEnglish (US)
Title of host publication8th International Joint Conference on Natural Language Processing - Proceedings of the IJCNLP 2017
PublisherAssociation for Computational Linguistics (ACL)
Pages362-372
Number of pages11
ISBN (Electronic)9781948087001
StatePublished - 2017
Externally publishedYes
Event8th International Joint Conference on Natural Language Processing, IJCNLP 2017 - Taipei, Taiwan, Province of China
Duration: Nov 27 2017Dec 1 2017

Publication series

Name8th International Joint Conference on Natural Language Processing - Proceedings of the IJCNLP 2017, System Demonstrations
Volume1

Conference

Conference8th International Joint Conference on Natural Language Processing, IJCNLP 2017
Country/TerritoryTaiwan, Province of China
CityTaipei
Period11/27/1712/1/17

ASJC Scopus subject areas

  • Language and Linguistics
  • Computational Theory and Mathematics
  • Computer Science Applications
  • Software

Fingerprint

Dive into the research topics of 'Embracing Non-Traditional Linguistic Resources for Low-resource Language Name Tagging'. Together they form a unique fingerprint.

Cite this