TaxoGen: Unsupervised topic taxonomy construction by adaptive term embedding and clustering

Chao Zhang, Fangbo Tao, Xiusi Chen, Jiaming Shen, Meng Jiang, Brian Sadler, Michelle Vanni, Jiawei Han

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Taxonomy construction is not only a fundamental task for semantic analysis of text corpora, but also an important step for applications such as information filtering, recommendation, and Web search. Existing pattern-based methods extract hypernym-hyponym term pairs and then organize these pairs into a taxonomy. However, by considering each term as an independent concept node, they overlook the topical proximity and the semantic correlations among terms. In this paper, we propose a method for constructing topic taxonomies, wherein every node represents a conceptual topic and is defined as a cluster of semantically coherent concept terms. Our method, TaxoGen, uses term embeddings and hierarchical clustering to construct a topic taxonomy in a recursive fashion. To ensure the quality of the recursive process, it consists of: (1) an adaptive spherical clustering module for allocating terms to proper levels when splitting a coarse topic into fine-grained ones; (2) a local embedding module for learning term embeddings that maintain strong discriminative power at different levels of the taxonomy. Our experiments on two real datasets demonstrate the effectiveness of TaxoGen compared with baseline methods.

Original languageEnglish (US)
Title of host publicationKDD 2018 - Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
PublisherAssociation for Computing Machinery
Pages2701-2709
Number of pages9
ISBN (Print)9781450355520
DOIs
StatePublished - Jul 19 2018
Event24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2018 - London, United Kingdom
Duration: Aug 19 2018Aug 23 2018

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Other

Other24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2018
CountryUnited Kingdom
CityLondon
Period8/19/188/23/18

ASJC Scopus subject areas

  • Software
  • Information Systems

Fingerprint Dive into the research topics of 'TaxoGen: Unsupervised topic taxonomy construction by adaptive term embedding and clustering'. Together they form a unique fingerprint.

  • Cite this

    Zhang, C., Tao, F., Chen, X., Shen, J., Jiang, M., Sadler, B., Vanni, M., & Han, J. (2018). TaxoGen: Unsupervised topic taxonomy construction by adaptive term embedding and clustering. In KDD 2018 - Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 2701-2709). (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining). Association for Computing Machinery. https://doi.org/10.1145/3219819.3220064