In this paper we describe a novel approach to discover cross-lingual comparable corpora based on video comparison. Then we propose a new task to extract and align information networks from comparable corpora. As a case study we demonstrate the effectiveness of utilizing bi-lingual information networks, wepresent a weakly-supervised and language-independent approach to mine name translation pairs. Based on the fact that some certain types of expressions are written in language independent forms, we generate seed pairs automatically. Starting from these seeds, we then apply a bootstrapping algorithm based on link comparison to mine more pairs iteratively. Results show that our approach can produce highly reliable name pairs. We also duplicate two state-of-the-art name translation mining methods and use two existing name translation gazetteers to compare with our approach. Comparisons show our approach can effectively augment the results from each of these alternative methods and resources.
- Cross-lingual comparable corpora
- Information network
ASJC Scopus subject areas
- Computer Science(all)