TY - GEN
T1 - Unsupervised named entity transliteration using temporal and phonetic correlation
AU - Tao, Tao
AU - Yoon, Su Youn
AU - Fister, Andrew
AU - Sproat, Richard
AU - Zhai, Cheng Xiang
PY - 2006
Y1 - 2006
N2 - In this paper we investigate unsupervised name transliteration using comparable corpora, corpora where texts in the two languages deal in some of the same topics- and therefore share references to named entities - but are not translations of each other. We present two distinct methods for transliteration, one approach using an unsupervised phonetic transliteration method, and the other using the temporal distribution of candidate pairs. Each of these approaches works quite well, but by combining the approaches one can achieve even better results. We believe that the novelty of our approach lies in the phonetic-based scoring method, which is based on a combination of carefully crafted phonetic features, and empirical results from the pronunciation errors of second-language learners of English. Unlike previous approaches to transliteration, this method can in principle work with any pair of languages in the absence of a training dictionary, provided one has an estimate of the pronunciation of words in text
AB - In this paper we investigate unsupervised name transliteration using comparable corpora, corpora where texts in the two languages deal in some of the same topics- and therefore share references to named entities - but are not translations of each other. We present two distinct methods for transliteration, one approach using an unsupervised phonetic transliteration method, and the other using the temporal distribution of candidate pairs. Each of these approaches works quite well, but by combining the approaches one can achieve even better results. We believe that the novelty of our approach lies in the phonetic-based scoring method, which is based on a combination of carefully crafted phonetic features, and empirical results from the pronunciation errors of second-language learners of English. Unlike previous approaches to transliteration, this method can in principle work with any pair of languages in the absence of a training dictionary, provided one has an estimate of the pronunciation of words in text
UR - http://www.scopus.com/inward/record.url?scp=80053348383&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=80053348383&partnerID=8YFLogxK
U2 - 10.3115/1610075.1610112
DO - 10.3115/1610075.1610112
M3 - Conference contribution
AN - SCOPUS:80053348383
SN - 1932432736
SN - 9781932432732
T3 - COLING/ACL 2006 - EMNLP 2006: 2006 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference
SP - 250
EP - 257
BT - COLING/ACL 2006 - EMNLP 2006
PB - Association for Computational Linguistics (ACL)
T2 - 11th Conference on Empirical Methods in Natural Language Proceessing, EMNLP 2006, Held in Conjunction with COLING/ACL 2006
Y2 - 22 July 2006 through 23 July 2006
ER -