TY - GEN
T1 - Active sample selection for named entity transliteration
AU - Goldwasser, Dan
AU - Roth, Dan
PY - 2008
Y1 - 2008
N2 - This paper introduces a new method for identifying named-entity (NE) transliterations within bilingual corpora. Current state-of-theart approaches usually require annotated data and relevant linguistic knowledge which may not be available for all languages. We show how to effectively train an accurate transliteration classifier using very little data, obtained automatically. To perform this task, we introduce a new active sampling paradigm for guiding and adapting the sample selection process. We also investigate how to improve the classifier by identifying repeated patterns in the training data. We evaluated our approach using English, Russian and Hebrew corpora.
AB - This paper introduces a new method for identifying named-entity (NE) transliterations within bilingual corpora. Current state-of-theart approaches usually require annotated data and relevant linguistic knowledge which may not be available for all languages. We show how to effectively train an accurate transliteration classifier using very little data, obtained automatically. To perform this task, we introduce a new active sampling paradigm for guiding and adapting the sample selection process. We also investigate how to improve the classifier by identifying repeated patterns in the training data. We evaluated our approach using English, Russian and Hebrew corpora.
UR - http://www.scopus.com/inward/record.url?scp=79955665784&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79955665784&partnerID=8YFLogxK
U2 - 10.3115/1557690.1557705
DO - 10.3115/1557690.1557705
M3 - Conference contribution
AN - SCOPUS:79955665784
SN - 9781932432046
T3 - ACL-08: HLT - 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference
SP - 53
EP - 56
BT - ACL-08
PB - Association for Computational Linguistics (ACL)
T2 - 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, ACL-08: HLT
Y2 - 15 June 2008 through 20 June 2008
ER -