TY - GEN
T1 - Reliability-aware dynamic feature composition for name tagging
AU - Lin, Ying
AU - Liu, Liyuan
AU - Ji, Heng
AU - Yu, Dong
AU - Han, Jiawei
N1 - Publisher Copyright:
© 2019 Association for Computational Linguistics
PY - 2020
Y1 - 2020
N2 - While word embeddings are widely used for a variety of tasks and substantially improve the performance, their quality is not consistent throughout the vocabulary due to the long-tail distribution of word frequency. Without sufficient contexts, embeddings of rare words are usually less reliable than those of common words. However, current models typically trust all word embeddings equally regardless of their reliability and thus may introduce noise and hurt the performance. Since names often contain rare and unknown words, this problem is particularly critical for name tagging. In this paper, we propose a novel reliability-aware name tagging model to tackle this issue. We design a set of word frequency-based reliability signals to indicate the quality of each word embedding. Guided by the reliability signals, the model is able to dynamically select and compose features such as word embedding and character-level representation using gating mechanisms. For example, if an input word is rare, the model relies less on its word embedding and assigns higher weights to its character and contextual features. Experiments on OntoNotes 5.0 show that our model outperforms the baseline model, obtaining up to 6.2% absolute gain in F-score. In cross-genre experiments on six genres in OntoNotes, our model improves the performance for most genre pairs and achieves 2.3% absolute F-score gain on average1.
AB - While word embeddings are widely used for a variety of tasks and substantially improve the performance, their quality is not consistent throughout the vocabulary due to the long-tail distribution of word frequency. Without sufficient contexts, embeddings of rare words are usually less reliable than those of common words. However, current models typically trust all word embeddings equally regardless of their reliability and thus may introduce noise and hurt the performance. Since names often contain rare and unknown words, this problem is particularly critical for name tagging. In this paper, we propose a novel reliability-aware name tagging model to tackle this issue. We design a set of word frequency-based reliability signals to indicate the quality of each word embedding. Guided by the reliability signals, the model is able to dynamically select and compose features such as word embedding and character-level representation using gating mechanisms. For example, if an input word is rare, the model relies less on its word embedding and assigns higher weights to its character and contextual features. Experiments on OntoNotes 5.0 show that our model outperforms the baseline model, obtaining up to 6.2% absolute gain in F-score. In cross-genre experiments on six genres in OntoNotes, our model improves the performance for most genre pairs and achieves 2.3% absolute F-score gain on average1.
UR - http://www.scopus.com/inward/record.url?scp=85084031697&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85084031697&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85084031697
T3 - ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference
SP - 165
EP - 174
BT - ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference
PB - Association for Computational Linguistics (ACL)
T2 - 57th Annual Meeting of the Association for Computational Linguistics, ACL 2019
Y2 - 28 July 2019 through 2 August 2019
ER -