Cross-lingual wikification using multilingual embeddings

Chen Tse Tsai, Dan Roth

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Cross-lingual Wikification is the task of grounding mentions written in non-English documents to entries in the English Wikipedia. This task involves the problem of comparing textual clues across languages, which requires developing a notion of similarity between text snippets across languages. In this paper, we address this problem by jointly training multilingual embeddings for words and Wikipedia titles. The proposed method can be applied to all languages represented in Wikipedia, including those for which no machine translation technology is available. We create a challenging dataset in 12 languages and show that our proposed approach outperforms various baselines. Moreover, our model compares favorably with the best systems on the TAC KBP2015 Entity Linking task including those that relied on the availability of translation from the target language to English.

Original languageEnglish (US)
Title of host publication2016 Conference of the North American Chapter of the Association for Computational Linguistics
Subtitle of host publicationHuman Language Technologies, NAACL HLT 2016 - Proceedings of the Conference
PublisherAssociation for Computational Linguistics (ACL)
Pages589-598
Number of pages10
ISBN (Electronic)9781941643914
DOIs
StatePublished - 2016
Event15th Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2016 - San Diego, United States
Duration: Jun 12 2016Jun 17 2016

Publication series

Name2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2016 - Proceedings of the Conference

Other

Other15th Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2016
CountryUnited States
CitySan Diego
Period6/12/166/17/16

ASJC Scopus subject areas

  • Computer Science Applications
  • Linguistics and Language
  • Language and Linguistics

Fingerprint Dive into the research topics of 'Cross-lingual wikification using multilingual embeddings'. Together they form a unique fingerprint.

Cite this