Joint bilingual name tagging for parallel corpora

Qi Li, Haibo Li, Heng Ji, Wen Wang, Jing Zheng, Fei Huang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Traditional isolated monolingual name taggers tend to yield inconsistent results across two languages. In this paper, we propose two novel approaches to jointly and consistently extract names from parallel corpora. The first approach uses standard linear-chain Conditional Random Fields (CRFs) as the learning framework, incorporating cross-lingual features propagated between two languages. The second approach is based on a joint CRFs model to jointly decode sentence pairs, incorporating bilingual factors based on word alignment. Experiments on Chinese-English parallel corpora demonstrated that the proposed methods significantly outperformed monolingual name taggers, were robust to automatic alignment noise and achieved state-of-the-art performance. With only 20%of the training data, our proposed methods can already achieve better performance compared to the baseline learned from the whole training set.

Original languageEnglish (US)
Title of host publicationCIKM 2012 - Proceedings of the 21st ACM International Conference on Information and Knowledge Management
Pages1727-1731
Number of pages5
DOIs
StatePublished - 2012
Externally publishedYes
Event21st ACM International Conference on Information and Knowledge Management, CIKM 2012 - Maui, HI, United States
Duration: Oct 29 2012Nov 2 2012

Publication series

NameACM International Conference Proceeding Series

Other

Other21st ACM International Conference on Information and Knowledge Management, CIKM 2012
Country/TerritoryUnited States
CityMaui, HI
Period10/29/1211/2/12

Keywords

  • bilingual
  • joint crfs
  • name tagging

ASJC Scopus subject areas

  • Software
  • Human-Computer Interaction
  • Computer Vision and Pattern Recognition
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Joint bilingual name tagging for parallel corpora'. Together they form a unique fingerprint.

Cite this