Joint learning of Chinese words, terms and keywords

Ziqiang Cao, Sujian Li, Heng Ji

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Previous work often used a pipelined framework where Chinese word segmentation is followed by term extraction and keyword extraction. Such framework suffers from error propagation and is unable to leverage information in later modules for prior components. In this paper, we propose a four-level Dirichlet Process based model (DP-4) to jointly learn the word distributions from the corpus, domain and document levels simultaneously. Based on the DP-4 model, a sentence-wise Gibbs sampler is adopted to obtain proper segmentation results. Meanwhile, terms and keywords are acquired in the sampling process. Experimental results have shown the effectiveness of our method.

Original languageEnglish (US)
Title of host publicationEMNLP 2014 - 2014 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference
PublisherAssociation for Computational Linguistics (ACL)
Pages1774-1778
Number of pages5
ISBN (Electronic)9781937284961
StatePublished - Jan 1 2014
Externally publishedYes
Event2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014 - Doha, Qatar
Duration: Oct 25 2014Oct 29 2014

Publication series

NameEMNLP 2014 - 2014 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference

Conference

Conference2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014
CountryQatar
CityDoha
Period10/25/1410/29/14

    Fingerprint

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Vision and Pattern Recognition
  • Information Systems

Cite this

Cao, Z., Li, S., & Ji, H. (2014). Joint learning of Chinese words, terms and keywords. In EMNLP 2014 - 2014 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference (pp. 1774-1778). (EMNLP 2014 - 2014 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference). Association for Computational Linguistics (ACL).