TY - GEN
T1 - Weakly-supervised neural text classification
AU - Meng, Yu
AU - Shen, Jiaming
AU - Zhang, Chao
AU - Han, Jiawei
N1 - Publisher Copyright:
© 2018 Association for Computing Machinery.
PY - 2018/10/17
Y1 - 2018/10/17
N2 - Deep neural networks are gaining increasing popularity for the classic text classification task, due to their strong expressive power and less requirement for feature engineering. Despite such attractiveness, neural text classification models suffer from the lack of training data in many real-world applications. Although many semi-supervised and weakly-supervised text classification models exist, they cannot be easily applied to deep neural models and meanwhile support limited supervision types. In this paper, we propose a weakly-supervised method that addresses the lack of training data in neural text classification. Our method consists of two modules: (1) a pseudo-document generator that leverages seed information to generate pseudo-labeled documents for model pre-training, and (2) a self-training module that bootstraps on real unlabeled data for model refinement. Our method has the flexibility to handle different types of weak supervision and can be easily integrated into existing deep neural models for text classification. We have performed extensive experiments on three real-world datasets from different domains. The results demonstrate that our proposed method achieves inspiring performance without requiring excessive training data and outperforms baseline methods significantly.
AB - Deep neural networks are gaining increasing popularity for the classic text classification task, due to their strong expressive power and less requirement for feature engineering. Despite such attractiveness, neural text classification models suffer from the lack of training data in many real-world applications. Although many semi-supervised and weakly-supervised text classification models exist, they cannot be easily applied to deep neural models and meanwhile support limited supervision types. In this paper, we propose a weakly-supervised method that addresses the lack of training data in neural text classification. Our method consists of two modules: (1) a pseudo-document generator that leverages seed information to generate pseudo-labeled documents for model pre-training, and (2) a self-training module that bootstraps on real unlabeled data for model refinement. Our method has the flexibility to handle different types of weak supervision and can be easily integrated into existing deep neural models for text classification. We have performed extensive experiments on three real-world datasets from different domains. The results demonstrate that our proposed method achieves inspiring performance without requiring excessive training data and outperforms baseline methods significantly.
KW - Neural Classification Model
KW - Pseudo Document Generation
KW - Text Classification
KW - Weakly-supervised Learning
UR - http://www.scopus.com/inward/record.url?scp=85058063220&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85058063220&partnerID=8YFLogxK
U2 - 10.1145/3269206.3271737
DO - 10.1145/3269206.3271737
M3 - Conference contribution
AN - SCOPUS:85058063220
T3 - International Conference on Information and Knowledge Management, Proceedings
SP - 983
EP - 992
BT - CIKM 2018 - Proceedings of the 27th ACM International Conference on Information and Knowledge Management
A2 - Paton, Norman
A2 - Candan, Selcuk
A2 - Wang, Haixun
A2 - Allan, James
A2 - Agrawal, Rakesh
A2 - Labrinidis, Alexandros
A2 - Cuzzocrea, Alfredo
A2 - Zaki, Mohammed
A2 - Srivastava, Divesh
A2 - Broder, Andrei
A2 - Schuster, Assaf
PB - Association for Computing Machinery
T2 - 27th ACM International Conference on Information and Knowledge Management, CIKM 2018
Y2 - 22 October 2018 through 26 October 2018
ER -