TY - GEN
T1 - Comparison of the impact of word segmentation on name tagging for Chinese and Japanese
AU - Li, Haibo
AU - Hagiwara, Masato
AU - Li, Qi
AU - Ji, Heng
N1 - Funding Information:
This work was supported by the U.S. Army Research Laboratory under Cooperative Agreement No. W911NF-09-2-0053 (NS-CTA), U.S. NSF CAREER Award under Grant IIS-0953149, U.S. DARPA Award No. FA8750-13-2-0041 in the “Deep Exploration and Filtering of Text” (DEFT) Program, IBM Faculty award and RPI faculty start-up grant. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation here on.
PY - 2014
Y1 - 2014
N2 - Word Segmentation is usually considered an essential step for many Chinese and Japanese Natural Language Processing tasks, such as name tagging. This paper presents several new observations and analysis on the impact of word segmentation on name tagging; (1). Due to the limitation of current state-of-the-art Chinese word segmentation performance, a character-based name tagger can outperform its word-based counterparts for Chinese but not for Japanese; (2). It is crucial to keep segmentation settings (e.g. definitions, specifications, methods) consistent between training and testing for name tagging; (3). As long as (2) is ensured, the performance of word segmentation does not have appreciable impact on Chinese and Japanese name tagging.
AB - Word Segmentation is usually considered an essential step for many Chinese and Japanese Natural Language Processing tasks, such as name tagging. This paper presents several new observations and analysis on the impact of word segmentation on name tagging; (1). Due to the limitation of current state-of-the-art Chinese word segmentation performance, a character-based name tagger can outperform its word-based counterparts for Chinese but not for Japanese; (2). It is crucial to keep segmentation settings (e.g. definitions, specifications, methods) consistent between training and testing for name tagging; (3). As long as (2) is ensured, the performance of word segmentation does not have appreciable impact on Chinese and Japanese name tagging.
KW - Information Extraction
KW - Name Tagging
KW - Word Segmentation
UR - http://www.scopus.com/inward/record.url?scp=85027685486&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85027685486&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85027685486
T3 - Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014
SP - 2532
EP - 2536
BT - Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014
A2 - Calzolari, Nicoletta
A2 - Choukri, Khalid
A2 - Goggi, Sara
A2 - Declerck, Thierry
A2 - Mariani, Joseph
A2 - Maegaard, Bente
A2 - Moreno, Asuncion
A2 - Odijk, Jan
A2 - Mazo, Helene
A2 - Piperidis, Stelios
A2 - Loftsson, Hrafn
PB - European Language Resources Association (ELRA)
T2 - 9th International Conference on Language Resources and Evaluation, LREC 2014
Y2 - 26 May 2014 through 31 May 2014
ER -