Unsupervised named entity transliteration using temporal and phonetic correlation

Tao Tao, Su Youn Yoon, Andrew Fister, Richard Sproat, Cheng Xiang Zhai

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper we investigate unsupervised name transliteration using comparable corpora, corpora where texts in the two languages deal in some of the same topics- and therefore share references to named entities - but are not translations of each other. We present two distinct methods for transliteration, one approach using an unsupervised phonetic transliteration method, and the other using the temporal distribution of candidate pairs. Each of these approaches works quite well, but by combining the approaches one can achieve even better results. We believe that the novelty of our approach lies in the phonetic-based scoring method, which is based on a combination of carefully crafted phonetic features, and empirical results from the pronunciation errors of second-language learners of English. Unlike previous approaches to transliteration, this method can in principle work with any pair of languages in the absence of a training dictionary, provided one has an estimate of the pronunciation of words in text

Original languageEnglish (US)
Title of host publicationCOLING/ACL 2006 - EMNLP 2006
Subtitle of host publication2006 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference
PublisherAssociation for Computational Linguistics (ACL)
Pages250-257
Number of pages8
ISBN (Print)1932432736, 9781932432732
DOIs
StatePublished - 2006
Event11th Conference on Empirical Methods in Natural Language Proceessing, EMNLP 2006, Held in Conjunction with COLING/ACL 2006 - Sydney, NSW, Australia
Duration: Jul 22 2006Jul 23 2006

Publication series

NameCOLING/ACL 2006 - EMNLP 2006: 2006 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference

Other

Other11th Conference on Empirical Methods in Natural Language Proceessing, EMNLP 2006, Held in Conjunction with COLING/ACL 2006
Country/TerritoryAustralia
CitySydney, NSW
Period7/22/067/23/06

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Information Systems

Fingerprint

Dive into the research topics of 'Unsupervised named entity transliteration using temporal and phonetic correlation'. Together they form a unique fingerprint.

Cite this