TY - GEN
T1 - Embracing Non-Traditional Linguistic Resources for Low-resource Language Name Tagging
AU - Zhang, Boliang
AU - Lu, Di
AU - Pan, Xiaoman
AU - Lin, Ying
AU - Abudukelimu, Halidanmu
AU - Ji, Heng
AU - Knight, Kevin
N1 - This work was supported by the U.S. DARPA LORELEI Program No. HR0011-15-C-0115. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation here on.
PY - 2017
Y1 - 2017
N2 - Current supervised name tagging approaches are inadequate for most low-resource languages due to the lack of annotated data and actionable linguistic knowledge. All supervised learning methods (including deep neural networks (DNN)) are sensitive to noise and thus they are not quite portable without massive clean annotations. We found that the F-scores of DNN-based name taggers drop rapidly (20%-30%) when we replace clean manual annotations with noisy annotations in the training data. We propose a new solution to incorporate many non-traditional language universal resources that are readily available but rarely explored in the Natural Language Processing (NLP) community, such as the World Atlas of Linguistic Structure, CIA names, PanLex and survival guides. We acquire and encode various types of non-traditional linguistic resources into a DNN name tagger. Experiments on three low-resource languages show that feeding linguistic knowledge can make DNN significantly more robust to noise, achieving 8%-22% absolute F-score gains on name tagging without using any human annotation.
AB - Current supervised name tagging approaches are inadequate for most low-resource languages due to the lack of annotated data and actionable linguistic knowledge. All supervised learning methods (including deep neural networks (DNN)) are sensitive to noise and thus they are not quite portable without massive clean annotations. We found that the F-scores of DNN-based name taggers drop rapidly (20%-30%) when we replace clean manual annotations with noisy annotations in the training data. We propose a new solution to incorporate many non-traditional language universal resources that are readily available but rarely explored in the Natural Language Processing (NLP) community, such as the World Atlas of Linguistic Structure, CIA names, PanLex and survival guides. We acquire and encode various types of non-traditional linguistic resources into a DNN name tagger. Experiments on three low-resource languages show that feeding linguistic knowledge can make DNN significantly more robust to noise, achieving 8%-22% absolute F-score gains on name tagging without using any human annotation.
UR - https://www.scopus.com/pages/publications/105019640970
UR - https://www.scopus.com/pages/publications/105019640970#tab=citedBy
M3 - Conference contribution
AN - SCOPUS:105019640970
T3 - 8th International Joint Conference on Natural Language Processing - Proceedings of the IJCNLP 2017, System Demonstrations
SP - 362
EP - 372
BT - 8th International Joint Conference on Natural Language Processing - Proceedings of the IJCNLP 2017
PB - Association for Computational Linguistics (ACL)
T2 - 8th International Joint Conference on Natural Language Processing, IJCNLP 2017
Y2 - 27 November 2017 through 1 December 2017
ER -