Abstract
This paper introduces a new method for identifying named-entity (NE) transliterations in bilingual corpora. Recent works have shown the advantage of discriminative approaches to transliteration: given two strings (w s,wt) in the source and target language, a classifier is trained to determine if wt is the transliteration of ws. This paper shows that the transliteration problem can be formulated as a constrained optimization problem and thus take into account contextual dependencies and constraints among character bi-grams in the two strings. We further explore several methods for learning the objective function of the optimization problem and show the advantage of learning it discriminately. Our experiments show that the new framework results in over 50% improvement in translating English NEs to Hebrew.
Original language | English (US) |
---|---|
Pages | 353-362 |
Number of pages | 10 |
DOIs | |
State | Published - 2008 |
Externally published | Yes |
Event | 2008 Conference on Empirical Methods in Natural Language Processing, EMNLP 2008, Co-located with AMTA 2008 and the International Workshop on Spoken Language Translation - Honolulu, HI, United States Duration: Oct 25 2008 → Oct 27 2008 |
Other
Other | 2008 Conference on Empirical Methods in Natural Language Processing, EMNLP 2008, Co-located with AMTA 2008 and the International Workshop on Spoken Language Translation |
---|---|
Country/Territory | United States |
City | Honolulu, HI |
Period | 10/25/08 → 10/27/08 |
ASJC Scopus subject areas
- Computational Theory and Mathematics
- Computer Science Applications
- Information Systems