Joint learning of Chinese words, terms and keywords

Ziqiang Cao, Sujian Li, Heng Ji

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Previous work often used a pipelined framework where Chinese word segmentation is followed by term extraction and keyword extraction. Such framework suffers from error propagation and is unable to leverage information in later modules for prior components. In this paper, we propose a four-level Dirichlet Process based model (DP-4) to jointly learn the word distributions from the corpus, domain and document levels simultaneously. Based on the DP-4 model, a sentence-wise Gibbs sampler is adopted to obtain proper segmentation results. Meanwhile, terms and keywords are acquired in the sampling process. Experimental results have shown the effectiveness of our method.

Original languageEnglish (US)
Title of host publicationEMNLP 2014 - 2014 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference
PublisherAssociation for Computational Linguistics (ACL)
Pages1774-1778
Number of pages5
ISBN (Electronic)9781937284961
DOIs
StatePublished - 2014
Externally publishedYes
Event2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014 - Doha, Qatar
Duration: Oct 25 2014Oct 29 2014

Publication series

NameEMNLP 2014 - 2014 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference

Conference

Conference2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014
Country/TerritoryQatar
CityDoha
Period10/25/1410/29/14

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Vision and Pattern Recognition
  • Information Systems

Fingerprint

Dive into the research topics of 'Joint learning of Chinese words, terms and keywords'. Together they form a unique fingerprint.

Cite this