TY - GEN
T1 - Discriminative Topic Mining via Category-Name Guided Text Embedding
AU - Meng, Yu
AU - Huang, Jiaxin
AU - Wang, Guangyuan
AU - Wang, Zihan
AU - Zhang, Chao
AU - Zhang, Yu
AU - Han, Jiawei
N1 - Publisher Copyright:
© 2020 ACM.
PY - 2020/4/20
Y1 - 2020/4/20
N2 - Mining a set of meaningful and distinctive topics automatically from massive text corpora has broad applications. Existing topic models, however, typically work in a purely unsupervised way, which often generate topics that do not fit users' particular needs and yield suboptimal performance on downstream tasks. We propose a new task, discriminative topic mining, which leverages a set of user-provided category names to mine discriminative topics from text corpora. This new task not only helps a user understand clearly and distinctively the topics he/she is most interested in, but also benefits directly keyword-driven classification tasks. We develop CatE, a novel category-name guided text embedding method for discriminative topic mining, which effectively leverages minimal user guidance to learn a discriminative embedding space and discover category representative terms in an iterative manner. We conduct a comprehensive set of experiments to show that CatE mines high-quality set of topics guided by category names only, and benefits a variety of downstream applications including weakly-supervised classification and lexical entailment direction identification.
AB - Mining a set of meaningful and distinctive topics automatically from massive text corpora has broad applications. Existing topic models, however, typically work in a purely unsupervised way, which often generate topics that do not fit users' particular needs and yield suboptimal performance on downstream tasks. We propose a new task, discriminative topic mining, which leverages a set of user-provided category names to mine discriminative topics from text corpora. This new task not only helps a user understand clearly and distinctively the topics he/she is most interested in, but also benefits directly keyword-driven classification tasks. We develop CatE, a novel category-name guided text embedding method for discriminative topic mining, which effectively leverages minimal user guidance to learn a discriminative embedding space and discover category representative terms in an iterative manner. We conduct a comprehensive set of experiments to show that CatE mines high-quality set of topics guided by category names only, and benefits a variety of downstream applications including weakly-supervised classification and lexical entailment direction identification.
KW - Discriminative Analysis
KW - Text Classification
KW - Text Embedding
KW - Topic Mining
UR - http://www.scopus.com/inward/record.url?scp=85086578909&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85086578909&partnerID=8YFLogxK
U2 - 10.1145/3366423.3380278
DO - 10.1145/3366423.3380278
M3 - Conference contribution
AN - SCOPUS:85086578909
T3 - The Web Conference 2020 - Proceedings of the World Wide Web Conference, WWW 2020
SP - 2121
EP - 2132
BT - The Web Conference 2020 - Proceedings of the World Wide Web Conference, WWW 2020
PB - Association for Computing Machinery
T2 - 29th International World Wide Web Conference, WWW 2020
Y2 - 20 April 2020 through 24 April 2020
ER -