TY - CONF
T1 - Language Model Pre-Training with Sparse Latent Typing
AU - Ren, Liliang
AU - Zhang, Zixuan
AU - Wang, Han
AU - Voss, Clare R.
AU - Zhai, Chengxiang
AU - Ji, Heng
N1 - We thank the anonymous reviewers helpful suggestions. This material is based upon work supported in part by the IBM-Illinois Discovery Accelerator Institute and by the National Science Foundation under Grant No. 1801652. This research is also based upon work supported by U.S. DARPA KAIROS Program No. FA8750-19-2-1004 and U.S. DARPA AIDA Program No. FA8750-18-2-0014. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied, of DARPA, or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for governmental purposes notwithstanding any copyright annotation therein. We also thank Suyu Ge for early discussions on this project.
PY - 2022
Y1 - 2022
N2 - Modern large-scale Pre-trained Language Models (PLMs) have achieved tremendous success on a wide range of downstream tasks. However, most of the LM pre-training objectives only focus on text reconstruction, but have not sought to learn latent-level interpretable representations of sentences. In this paper, we manage to push the language models to obtain a deeper understanding of sentences by proposing a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types. Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge. Besides, the language model pre-trained with such an objective also significantly improves Information Extraction related downstream tasks in both supervised and few-shot settings. Our code is publicly available at https://github.com/renll/SparseLT.
AB - Modern large-scale Pre-trained Language Models (PLMs) have achieved tremendous success on a wide range of downstream tasks. However, most of the LM pre-training objectives only focus on text reconstruction, but have not sought to learn latent-level interpretable representations of sentences. In this paper, we manage to push the language models to obtain a deeper understanding of sentences by proposing a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types. Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge. Besides, the language model pre-trained with such an objective also significantly improves Information Extraction related downstream tasks in both supervised and few-shot settings. Our code is publicly available at https://github.com/renll/SparseLT.
UR - http://www.scopus.com/inward/record.url?scp=85149435380&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85149435380&partnerID=8YFLogxK
U2 - 10.18653/v1/2022.emnlp-main.96
DO - 10.18653/v1/2022.emnlp-main.96
M3 - Paper
AN - SCOPUS:85149435380
SP - 1480
EP - 1494
T2 - 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022
Y2 - 7 December 2022 through 11 December 2022
ER -