Weakly-supervised neural text classification

Yu Meng, Jiaming Shen, Chao Zhang, Jiawei Han

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Deep neural networks are gaining increasing popularity for the classic text classification task, due to their strong expressive power and less requirement for feature engineering. Despite such attractiveness, neural text classification models suffer from the lack of training data in many real-world applications. Although many semi-supervised and weakly-supervised text classification models exist, they cannot be easily applied to deep neural models and meanwhile support limited supervision types. In this paper, we propose a weakly-supervised method that addresses the lack of training data in neural text classification. Our method consists of two modules: (1) a pseudo-document generator that leverages seed information to generate pseudo-labeled documents for model pre-training, and (2) a self-training module that bootstraps on real unlabeled data for model refinement. Our method has the flexibility to handle different types of weak supervision and can be easily integrated into existing deep neural models for text classification. We have performed extensive experiments on three real-world datasets from different domains. The results demonstrate that our proposed method achieves inspiring performance without requiring excessive training data and outperforms baseline methods significantly.

Original languageEnglish (US)
Title of host publicationCIKM 2018 - Proceedings of the 27th ACM International Conference on Information and Knowledge Management
EditorsNorman Paton, Selcuk Candan, Haixun Wang, James Allan, Rakesh Agrawal, Alexandros Labrinidis, Alfredo Cuzzocrea, Mohammed Zaki, Divesh Srivastava, Andrei Broder, Assaf Schuster
PublisherAssociation for Computing Machinery
Pages983-992
Number of pages10
ISBN (Electronic)9781450360142
DOIs
StatePublished - Oct 17 2018
Event27th ACM International Conference on Information and Knowledge Management, CIKM 2018 - Torino, Italy
Duration: Oct 22 2018Oct 26 2018

Publication series

NameInternational Conference on Information and Knowledge Management, Proceedings

Other

Other27th ACM International Conference on Information and Knowledge Management, CIKM 2018
Country/TerritoryItaly
CityTorino
Period10/22/1810/26/18

Keywords

  • Neural Classification Model
  • Pseudo Document Generation
  • Text Classification
  • Weakly-supervised Learning

ASJC Scopus subject areas

  • General Business, Management and Accounting
  • General Decision Sciences

Fingerprint

Dive into the research topics of 'Weakly-supervised neural text classification'. Together they form a unique fingerprint.

Cite this