Chinese named entity recognition based on multilevel linguistic features

Honglei Guo, Jianmin Jiang, Gang Hu, Tong Zhang

Research output: Contribution to journalConference articlepeer-review

Abstract

This paper presents a Chinese named entity recognition system that employs the Robust Risk Minimization (RRM) classification method and incorporates the advantages of character-based and word-based models. From experiments on a large-scale corpus, we show that significant performance enhancements can be obtained by integrating various linguistic information (such as Chinese word segmentation, semantic types, part of speech, and named entity triggers) into a basic Chinese character based model. A novel feature weighting mechanism is also employed to obtain more useful cues from most important linguistic features. Moreover, to overcome the limitation of computational resources in building a high-quality named entity recognition system from a large-scale corpus, informative samples are selected by an active learning approach.

Original languageEnglish (US)
Pages (from-to)90-99
Number of pages10
JournalLecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science)
Volume3248
DOIs
StatePublished - 2005
Externally publishedYes
EventFirst International Joint Conference on Natural Language Processing - IJCNLP 2004 - Hainan Island, China
Duration: Mar 22 2004Mar 24 2004

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Chinese named entity recognition based on multilevel linguistic features'. Together they form a unique fingerprint.

Cite this