Active sample selection for named entity transliteration

Dan Goldwasser, Dan Roth

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper introduces a new method for identifying named-entity (NE) transliterations within bilingual corpora. Current state-of-theart approaches usually require annotated data and relevant linguistic knowledge which may not be available for all languages. We show how to effectively train an accurate transliteration classifier using very little data, obtained automatically. To perform this task, we introduce a new active sampling paradigm for guiding and adapting the sample selection process. We also investigate how to improve the classifier by identifying repeated patterns in the training data. We evaluated our approach using English, Russian and Hebrew corpora.

Original languageEnglish (US)
Title of host publicationACL-08
Subtitle of host publicationHLT - 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference
Pages53-56
Number of pages4
StatePublished - Dec 1 2008
Event46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, ACL-08: HLT - Columbus, OH, United States
Duration: Jun 15 2008Jun 20 2008

Publication series

NameACL-08: HLT - 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference

Other

Other46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, ACL-08: HLT
CountryUnited States
CityColumbus, OH
Period6/15/086/20/08

Fingerprint

Classifiers
Linguistics
Sampling
paradigm
linguistics
language
Sample Selection
Transliteration
Entity
Classifier
Linguistic Knowledge
Paradigm
Language
Train

ASJC Scopus subject areas

  • Language and Linguistics
  • Computer Networks and Communications
  • Linguistics and Language

Cite this

Goldwasser, D., & Roth, D. (2008). Active sample selection for named entity transliteration. In ACL-08: HLT - 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference (pp. 53-56). (ACL-08: HLT - 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference).

Active sample selection for named entity transliteration. / Goldwasser, Dan; Roth, Dan.

ACL-08: HLT - 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference. 2008. p. 53-56 (ACL-08: HLT - 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Goldwasser, D & Roth, D 2008, Active sample selection for named entity transliteration. in ACL-08: HLT - 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference. ACL-08: HLT - 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference, pp. 53-56, 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, ACL-08: HLT, Columbus, OH, United States, 6/15/08.
Goldwasser D, Roth D. Active sample selection for named entity transliteration. In ACL-08: HLT - 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference. 2008. p. 53-56. (ACL-08: HLT - 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference).
Goldwasser, Dan ; Roth, Dan. / Active sample selection for named entity transliteration. ACL-08: HLT - 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference. 2008. pp. 53-56 (ACL-08: HLT - 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference).
@inproceedings{ae8fd78427754795b4fb9896b96fd036,
title = "Active sample selection for named entity transliteration",
abstract = "This paper introduces a new method for identifying named-entity (NE) transliterations within bilingual corpora. Current state-of-theart approaches usually require annotated data and relevant linguistic knowledge which may not be available for all languages. We show how to effectively train an accurate transliteration classifier using very little data, obtained automatically. To perform this task, we introduce a new active sampling paradigm for guiding and adapting the sample selection process. We also investigate how to improve the classifier by identifying repeated patterns in the training data. We evaluated our approach using English, Russian and Hebrew corpora.",
author = "Dan Goldwasser and Dan Roth",
year = "2008",
month = "12",
day = "1",
language = "English (US)",
isbn = "9781932432046",
series = "ACL-08: HLT - 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference",
pages = "53--56",
booktitle = "ACL-08",

}

TY - GEN

T1 - Active sample selection for named entity transliteration

AU - Goldwasser, Dan

AU - Roth, Dan

PY - 2008/12/1

Y1 - 2008/12/1

N2 - This paper introduces a new method for identifying named-entity (NE) transliterations within bilingual corpora. Current state-of-theart approaches usually require annotated data and relevant linguistic knowledge which may not be available for all languages. We show how to effectively train an accurate transliteration classifier using very little data, obtained automatically. To perform this task, we introduce a new active sampling paradigm for guiding and adapting the sample selection process. We also investigate how to improve the classifier by identifying repeated patterns in the training data. We evaluated our approach using English, Russian and Hebrew corpora.

AB - This paper introduces a new method for identifying named-entity (NE) transliterations within bilingual corpora. Current state-of-theart approaches usually require annotated data and relevant linguistic knowledge which may not be available for all languages. We show how to effectively train an accurate transliteration classifier using very little data, obtained automatically. To perform this task, we introduce a new active sampling paradigm for guiding and adapting the sample selection process. We also investigate how to improve the classifier by identifying repeated patterns in the training data. We evaluated our approach using English, Russian and Hebrew corpora.

UR - http://www.scopus.com/inward/record.url?scp=79955665784&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79955665784&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:79955665784

SN - 9781932432046

T3 - ACL-08: HLT - 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference

SP - 53

EP - 56

BT - ACL-08

ER -