TY - GEN
T1 - Unsupervised Language-Independent Name Translation Mining from Wikipedia Infoboxes
AU - Lin, Wen Pin
AU - Snover, Matthew
AU - Ji, Heng
N1 - Publisher Copyright:
© 2011 Association for Computational Linguistics
PY - 2011
Y1 - 2011
N2 - The automatic generation of entity profiles from unstructured text, such as Knowledge Base Population, if applied in a multi-lingual setting, generates the need to align such profiles from multiple languages in an unsupervised manner. This paper describes an unsupervised and language-independent approach to mine name translation pairs from entity profiles, using Wikipedia Infoboxes as a stand-in for high quality entity profile extraction. Pairs are initially found using expressions that are written in language-independent forms (such as dates and numbers), and new translations are then mined from these pairs. The algorithm then iteratively bootstraps from these translations to learn more pairs and more translations. The algorithm maintains a high precision, over 95%, for the majority of its iterations, with a slightly lower precision of 85.9% and an f-score of 76%. A side effect of the name mining algorithm is the unsupervised creation of a translation lexicon between the two languages, with an accuracy of 64%. We also duplicate three state-of-the-art name translation mining methods and use two existing name translation gazetteers to compare with our approach. Comparisons show our approach can effectively augment the results from each of these alternative methods and resources.
AB - The automatic generation of entity profiles from unstructured text, such as Knowledge Base Population, if applied in a multi-lingual setting, generates the need to align such profiles from multiple languages in an unsupervised manner. This paper describes an unsupervised and language-independent approach to mine name translation pairs from entity profiles, using Wikipedia Infoboxes as a stand-in for high quality entity profile extraction. Pairs are initially found using expressions that are written in language-independent forms (such as dates and numbers), and new translations are then mined from these pairs. The algorithm then iteratively bootstraps from these translations to learn more pairs and more translations. The algorithm maintains a high precision, over 95%, for the majority of its iterations, with a slightly lower precision of 85.9% and an f-score of 76%. A side effect of the name mining algorithm is the unsupervised creation of a translation lexicon between the two languages, with an accuracy of 64%. We also duplicate three state-of-the-art name translation mining methods and use two existing name translation gazetteers to compare with our approach. Comparisons show our approach can effectively augment the results from each of these alternative methods and resources.
UR - http://www.scopus.com/inward/record.url?scp=84899816843&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84899816843&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84899816843
T3 - Workshop on Unsupervised Learning in NLP at the 2011 Conference on Empirical Methods in Natural Language Processing, EMNLP 2011 - Proceedings
SP - 43
EP - 52
BT - Workshop on Unsupervised Learning in NLP at the 2011 Conference on Empirical Methods in Natural Language Processing, EMNLP 2011 - Proceedings
A2 - Abend, Omri
A2 - Korhonen, Anna
A2 - Rappoport, Ari
A2 - Reichart, Roi
PB - Association for Computational Linguistics (ACL)
T2 - 1st Workshop on Unsupervised Learning in NLP at the 2011 Conference on Empirical Methods in Natural Language Processing, EMNLP 2011
Y2 - 30 July 2011
ER -