TY - GEN
T1 - Discriminative Topic Mining via Category-Name Guided Text Embedding
AU - Meng, Yu
AU - Huang, Jiaxin
AU - Wang, Guangyuan
AU - Wang, Zihan
AU - Zhang, Chao
AU - Zhang, Yu
AU - Han, Jiawei
N1 - Funding Information:
The work of the Centre for Assessment Research, Policy and Practice in Education at Dublin City University is supported by Prometric. The author would like to thank the test development team at Prometric for stimulating her interest in this topic. Acknowledgements are also due to Anastasios Karakolidis, Michael O’Leary and Linda Waters for their helpful comments on drafts of the manuscript. The views expressed in this article are those of the author and do not necessarily represent the views of Prometric.
Publisher Copyright:
© 2020 ACM.
PY - 2020/4/20
Y1 - 2020/4/20
N2 - Mining a set of meaningful and distinctive topics automatically from massive text corpora has broad applications. Existing topic models, however, typically work in a purely unsupervised way, which often generate topics that do not fit users' particular needs and yield suboptimal performance on downstream tasks. We propose a new task, discriminative topic mining, which leverages a set of user-provided category names to mine discriminative topics from text corpora. This new task not only helps a user understand clearly and distinctively the topics he/she is most interested in, but also benefits directly keyword-driven classification tasks. We develop CatE, a novel category-name guided text embedding method for discriminative topic mining, which effectively leverages minimal user guidance to learn a discriminative embedding space and discover category representative terms in an iterative manner. We conduct a comprehensive set of experiments to show that CatE mines high-quality set of topics guided by category names only, and benefits a variety of downstream applications including weakly-supervised classification and lexical entailment direction identification.
AB - Mining a set of meaningful and distinctive topics automatically from massive text corpora has broad applications. Existing topic models, however, typically work in a purely unsupervised way, which often generate topics that do not fit users' particular needs and yield suboptimal performance on downstream tasks. We propose a new task, discriminative topic mining, which leverages a set of user-provided category names to mine discriminative topics from text corpora. This new task not only helps a user understand clearly and distinctively the topics he/she is most interested in, but also benefits directly keyword-driven classification tasks. We develop CatE, a novel category-name guided text embedding method for discriminative topic mining, which effectively leverages minimal user guidance to learn a discriminative embedding space and discover category representative terms in an iterative manner. We conduct a comprehensive set of experiments to show that CatE mines high-quality set of topics guided by category names only, and benefits a variety of downstream applications including weakly-supervised classification and lexical entailment direction identification.
KW - Discriminative Analysis
KW - Text Classification
KW - Text Embedding
KW - Topic Mining
UR - http://www.scopus.com/inward/record.url?scp=85086578909&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85086578909&partnerID=8YFLogxK
U2 - 10.1145/3366423.3380278
DO - 10.1145/3366423.3380278
M3 - Conference contribution
AN - SCOPUS:85086578909
T3 - The Web Conference 2020 - Proceedings of the World Wide Web Conference, WWW 2020
SP - 2121
EP - 2132
BT - The Web Conference 2020 - Proceedings of the World Wide Web Conference, WWW 2020
PB - Association for Computing Machinery, Inc
T2 - 29th International World Wide Web Conference, WWW 2020
Y2 - 20 April 2020 through 24 April 2020
ER -