Bitext name tagging for cross-lingual entity annotation projection

Dongxu Zhang, Boliang Zhang, Xiaoman Pan, Xiaocheng Feng, Heng Ji, Weiran Xu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Annotation projection is a practical method to deal with the low resource problem in incident languages (IL) processing. Previous methods on annotation projection mainly relied on word alignment results without any training process, which led to noise propagation caused by word alignment errors. In this paper, we focus on the named entity recognition (NER) task and propose a weakly-supervised framework to project entity annotations from English to IL through bitexts. Instead of directly relying on word alignment results, this framework combines advantages of rule-based methods and deep learning methods by implementing two steps: First, generates a high-confidence entity annotation set on IL side with strict searching methods; Second, uses this high-confidence set to weakly supervise the model training. The model is finally used to accomplish the projecting process. Experimental results on two low-resource ILs show that the proposed method can generate better annotations projected from English-IL parallel corpora. The performance of IL name tagger can also be improved significantly by training on the newly projected IL annotation set.

Original languageEnglish (US)
Title of host publicationCOLING 2016 - 26th International Conference on Computational Linguistics, Proceedings of COLING 2016
Subtitle of host publicationTechnical Papers
PublisherAssociation for Computational Linguistics, ACL Anthology
Pages461-470
Number of pages10
ISBN (Print)9784879747020
StatePublished - 2016
Externally publishedYes
Event26th International Conference on Computational Linguistics, COLING 2016 - Osaka, Japan
Duration: Dec 11 2016Dec 16 2016

Publication series

NameCOLING 2016 - 26th International Conference on Computational Linguistics, Proceedings of COLING 2016: Technical Papers

Other

Other26th International Conference on Computational Linguistics, COLING 2016
Country/TerritoryJapan
CityOsaka
Period12/11/1612/16/16

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Language and Linguistics
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Bitext name tagging for cross-lingual entity annotation projection'. Together they form a unique fingerprint.

Cite this