HowtogetaChineseName(Entity): Segmentation and Combination Issues

Hongyan Jing, Radu Florian, Xiaoqiang Luo, Tong Zhang, Abraham Ittycheriah

Research output: Contribution to conferencePaperpeer-review

Abstract

When building a Chinese named entity recognition system, one must deal with certain language-specific issues such as whether the model should be based on characters or words. While there is no unique answer to this question, we discuss in detail advantages and disadvantages of each model, identify problems in segmentation and suggest possible solutions, presenting our observations, analysis, and experimental results. The second topic of this paper is classifier combination. We present and describe four classifiers for Chinese named entity recognition and describe various methods for combining their outputs. The results demonstrate that classifier combination is an effective technique of improving system performance: experiments over a large annotated corpus of fine-grained entity types exhibit a 10% relative reduction in F-measure error.

Original languageEnglish (US)
Pages200-207
Number of pages8
DOIs
StatePublished - 2003
Externally publishedYes
Event8th Conference on Empirical Methods in Natural Language Processing, EMNLP 2003 - Sapporo, Japan
Duration: Jul 11 2003Jul 12 2003

Conference

Conference8th Conference on Empirical Methods in Natural Language Processing, EMNLP 2003
Country/TerritoryJapan
CitySapporo
Period7/11/037/12/03

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Information Systems

Fingerprint

Dive into the research topics of 'HowtogetaChineseName(Entity): Segmentation and Combination Issues'. Together they form a unique fingerprint.

Cite this