TY - GEN
T1 - Bitext name tagging for cross-lingual entity annotation projection
AU - Zhang, Dongxu
AU - Zhang, Boliang
AU - Pan, Xiaoman
AU - Feng, Xiaocheng
AU - Ji, Heng
AU - Xu, Weiran
N1 - Publisher Copyright:
© 1963-2018 ACL.
PY - 2016
Y1 - 2016
N2 - Annotation projection is a practical method to deal with the low resource problem in incident languages (IL) processing. Previous methods on annotation projection mainly relied on word alignment results without any training process, which led to noise propagation caused by word alignment errors. In this paper, we focus on the named entity recognition (NER) task and propose a weakly-supervised framework to project entity annotations from English to IL through bitexts. Instead of directly relying on word alignment results, this framework combines advantages of rule-based methods and deep learning methods by implementing two steps: First, generates a high-confidence entity annotation set on IL side with strict searching methods; Second, uses this high-confidence set to weakly supervise the model training. The model is finally used to accomplish the projecting process. Experimental results on two low-resource ILs show that the proposed method can generate better annotations projected from English-IL parallel corpora. The performance of IL name tagger can also be improved significantly by training on the newly projected IL annotation set.
AB - Annotation projection is a practical method to deal with the low resource problem in incident languages (IL) processing. Previous methods on annotation projection mainly relied on word alignment results without any training process, which led to noise propagation caused by word alignment errors. In this paper, we focus on the named entity recognition (NER) task and propose a weakly-supervised framework to project entity annotations from English to IL through bitexts. Instead of directly relying on word alignment results, this framework combines advantages of rule-based methods and deep learning methods by implementing two steps: First, generates a high-confidence entity annotation set on IL side with strict searching methods; Second, uses this high-confidence set to weakly supervise the model training. The model is finally used to accomplish the projecting process. Experimental results on two low-resource ILs show that the proposed method can generate better annotations projected from English-IL parallel corpora. The performance of IL name tagger can also be improved significantly by training on the newly projected IL annotation set.
UR - http://www.scopus.com/inward/record.url?scp=85040925312&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85040925312&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85040925312
SN - 9784879747020
T3 - COLING 2016 - 26th International Conference on Computational Linguistics, Proceedings of COLING 2016: Technical Papers
SP - 461
EP - 470
BT - COLING 2016 - 26th International Conference on Computational Linguistics, Proceedings of COLING 2016
PB - Association for Computational Linguistics, ACL Anthology
T2 - 26th International Conference on Computational Linguistics, COLING 2016
Y2 - 11 December 2016 through 16 December 2016
ER -