Unsupervised Language-Independent Name Translation Mining from Wikipedia Infoboxes

Wen Pin Lin, Matthew Snover, Heng Ji

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The automatic generation of entity profiles from unstructured text, such as Knowledge Base Population, if applied in a multi-lingual setting, generates the need to align such profiles from multiple languages in an unsupervised manner. This paper describes an unsupervised and language-independent approach to mine name translation pairs from entity profiles, using Wikipedia Infoboxes as a stand-in for high quality entity profile extraction. Pairs are initially found using expressions that are written in language-independent forms (such as dates and numbers), and new translations are then mined from these pairs. The algorithm then iteratively bootstraps from these translations to learn more pairs and more translations. The algorithm maintains a high precision, over 95%, for the majority of its iterations, with a slightly lower precision of 85.9% and an f-score of 76%. A side effect of the name mining algorithm is the unsupervised creation of a translation lexicon between the two languages, with an accuracy of 64%. We also duplicate three state-of-the-art name translation mining methods and use two existing name translation gazetteers to compare with our approach. Comparisons show our approach can effectively augment the results from each of these alternative methods and resources.

Original languageEnglish (US)
Title of host publicationWorkshop on Unsupervised Learning in NLP at the 2011 Conference on Empirical Methods in Natural Language Processing, EMNLP 2011 - Proceedings
EditorsOmri Abend, Anna Korhonen, Ari Rappoport, Roi Reichart
PublisherAssociation for Computational Linguistics (ACL)
Pages43-52
Number of pages10
ISBN (Electronic)1937284131, 9781937284138
StatePublished - 2011
Externally publishedYes
Event1st Workshop on Unsupervised Learning in NLP at the 2011 Conference on Empirical Methods in Natural Language Processing, EMNLP 2011 - Edinburgh, United Kingdom
Duration: Jul 30 2011 → …

Publication series

NameWorkshop on Unsupervised Learning in NLP at the 2011 Conference on Empirical Methods in Natural Language Processing, EMNLP 2011 - Proceedings

Conference

Conference1st Workshop on Unsupervised Learning in NLP at the 2011 Conference on Empirical Methods in Natural Language Processing, EMNLP 2011
Country/TerritoryUnited Kingdom
CityEdinburgh
Period7/30/11 → …

ASJC Scopus subject areas

  • Computer Science Applications
  • Information Systems
  • Computational Theory and Mathematics

Fingerprint

Dive into the research topics of 'Unsupervised Language-Independent Name Translation Mining from Wikipedia Infoboxes'. Together they form a unique fingerprint.

Cite this