Named entity transliteration with comparable corpora

Richard Sproat, Tao Tao, Cheng Xiang Zhai

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper we investigate Chinese- English name transliteration using comparable corpora, corpora where texts in the two languages deal in some of the same topics - and therefore share references to named entities - but are not translations of each other. We present two distinct methods for transliteration, one approach using phonetic transliteration, and the second using the temporal distribution of candidate pairs. Each of these approaches works quite well, but by combining the approaches one can achieve even better results. We then propose a novel score propagation method that utilizes the co-occurrence of transliteration pairs within document pairs. This propagation method achieves further improvement over the best results from the previous step.

Original languageEnglish (US)
Title of host publicationCOLING/ACL 2006 - 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference
PublisherAssociation for Computational Linguistics (ACL)
Pages73-80
Number of pages8
ISBN (Print)1932432655, 9781932432657
DOIs
StatePublished - 2006
Event21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, COLING/ACL 2006 - Sydney, NSW, Australia
Duration: Jul 17 2006Jul 21 2006

Publication series

NameCOLING/ACL 2006 - 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference
Volume1

Other

Other21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, COLING/ACL 2006
Country/TerritoryAustralia
CitySydney, NSW
Period7/17/067/21/06

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Named entity transliteration with comparable corpora'. Together they form a unique fingerprint.

Cite this