TY - GEN
T1 - Efficient contextualized representation
T2 - 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018
AU - Liu, Liyuan
AU - Ren, Xiang
AU - Shang, Jingbo
AU - Gu, Xiaotao
AU - Peng, Jian
AU - Han, Jiawei
N1 - Publisher Copyright:
© 2018 Association for Computational Linguistics
PY - 2018
Y1 - 2018
N2 - Many efforts have been made to facilitate natural language processing tasks with pre-trained language models (LMs), and brought significant improvements to various applications. To fully leverage the nearly unlimited corpora and capture linguistic information of multifarious levels, large-size LMs are required; but for a specific task, only parts of these information are useful. Such large-sized LMs, even in the inference stage, may cause heavy computation workloads, making them too time-consuming for large-scale applications. Here we propose to compress bulky LMs while preserving useful information with regard to a specific task. As different layers of the model keep different information, we develop a layer selection method for model pruning using sparsity-inducing regularization. By introducing the dense connectivity, we can detach any layer without affecting others, and stretch shallow and wide LMs to be deep and narrow. In model training, LMs are learned with layer-wise dropouts for better robustness. Experiments on two benchmark datasets demonstrate the effectiveness of our method.
AB - Many efforts have been made to facilitate natural language processing tasks with pre-trained language models (LMs), and brought significant improvements to various applications. To fully leverage the nearly unlimited corpora and capture linguistic information of multifarious levels, large-size LMs are required; but for a specific task, only parts of these information are useful. Such large-sized LMs, even in the inference stage, may cause heavy computation workloads, making them too time-consuming for large-scale applications. Here we propose to compress bulky LMs while preserving useful information with regard to a specific task. As different layers of the model keep different information, we develop a layer selection method for model pruning using sparsity-inducing regularization. By introducing the dense connectivity, we can detach any layer without affecting others, and stretch shallow and wide LMs to be deep and narrow. In model training, LMs are learned with layer-wise dropouts for better robustness. Experiments on two benchmark datasets demonstrate the effectiveness of our method.
UR - http://www.scopus.com/inward/record.url?scp=85081112874&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85081112874&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85081112874
T3 - Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018
SP - 1215
EP - 1225
BT - Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018
A2 - Riloff, Ellen
A2 - Chiang, David
A2 - Hockenmaier, Julia
A2 - Tsujii, Jun'ichi
PB - Association for Computational Linguistics
Y2 - 31 October 2018 through 4 November 2018
ER -