Discriminative Topic Mining via Category-Name Guided Text Embedding

Yu Meng, Jiaxin Huang, Guangyuan Wang, Zihan Wang, Chao Zhang, Yu Zhang, Jiawei Han

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Mining a set of meaningful and distinctive topics automatically from massive text corpora has broad applications. Existing topic models, however, typically work in a purely unsupervised way, which often generate topics that do not fit users' particular needs and yield suboptimal performance on downstream tasks. We propose a new task, discriminative topic mining, which leverages a set of user-provided category names to mine discriminative topics from text corpora. This new task not only helps a user understand clearly and distinctively the topics he/she is most interested in, but also benefits directly keyword-driven classification tasks. We develop CatE, a novel category-name guided text embedding method for discriminative topic mining, which effectively leverages minimal user guidance to learn a discriminative embedding space and discover category representative terms in an iterative manner. We conduct a comprehensive set of experiments to show that CatE mines high-quality set of topics guided by category names only, and benefits a variety of downstream applications including weakly-supervised classification and lexical entailment direction identification.

Original languageEnglish (US)
Title of host publicationThe Web Conference 2020 - Proceedings of the World Wide Web Conference, WWW 2020
PublisherAssociation for Computing Machinery
Pages2121-2132
Number of pages12
ISBN (Electronic)9781450370233
DOIs
StatePublished - Apr 20 2020
Event29th International World Wide Web Conference, WWW 2020 - Taipei, Taiwan, Province of China
Duration: Apr 20 2020Apr 24 2020

Publication series

NameThe Web Conference 2020 - Proceedings of the World Wide Web Conference, WWW 2020

Conference

Conference29th International World Wide Web Conference, WWW 2020
Country/TerritoryTaiwan, Province of China
CityTaipei
Period4/20/204/24/20

Keywords

  • Discriminative Analysis
  • Text Classification
  • Text Embedding
  • Topic Mining

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Software

Fingerprint

Dive into the research topics of 'Discriminative Topic Mining via Category-Name Guided Text Embedding'. Together they form a unique fingerprint.

Cite this