TY - GEN
T1 - Unsupervised constraint driven learning for transliteration discovery
AU - Chang, Ming Wei
AU - Goldwasser, Dan
AU - Roth, Dan
AU - Tu, Yuancheng
PY - 2009
Y1 - 2009
N2 - This paper introduces a novel unsupervised constraint-driven learning algorithm for identifying named-entity (NE) transliterations in bilingual corpora. The proposed method does not require any annotated data or aligned corpora. Instead, it is bootstrapped using a simple resource - a romanization table. We show that this resource, when used in conjunction with constraints, can efficiently identify transliteration pairs. We evaluate the proposed method on transliterating English NEs to three different languages - Chinese, Russian and Hebrew. Our experiments show that constraint driven learning can significantly outperform existing unsupervised models and achieve competitive results to existing supervised models.
AB - This paper introduces a novel unsupervised constraint-driven learning algorithm for identifying named-entity (NE) transliterations in bilingual corpora. The proposed method does not require any annotated data or aligned corpora. Instead, it is bootstrapped using a simple resource - a romanization table. We show that this resource, when used in conjunction with constraints, can efficiently identify transliteration pairs. We evaluate the proposed method on transliterating English NEs to three different languages - Chinese, Russian and Hebrew. Our experiments show that constraint driven learning can significantly outperform existing unsupervised models and achieve competitive results to existing supervised models.
UR - http://www.scopus.com/inward/record.url?scp=77958023635&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77958023635&partnerID=8YFLogxK
U2 - 10.3115/1620754.1620798
DO - 10.3115/1620754.1620798
M3 - Conference contribution
AN - SCOPUS:77958023635
SN - 9781932432411
T3 - NAACL HLT 2009 - Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Proceedings of the Conference
SP - 299
EP - 307
BT - NAACL HLT 2009 - Human Language Technologies
PB - Association for Computational Linguistics (ACL)
T2 - Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, NAACL HLT 2009
Y2 - 31 May 2009 through 5 June 2009
ER -