Abstract
In this paper we describe a novel approach to discover cross-lingual comparable corpora based on video comparison. Then we propose a new task to extract and align information networks from comparable corpora. As a case study we demonstrate the effectiveness of utilizing bi-lingual information networks, wepresent a weakly-supervised and language-independent approach to mine name translation pairs. Based on the fact that some certain types of expressions are written in language independent forms, we generate seed pairs automatically. Starting from these seeds, we then apply a bootstrapping algorithm based on link comparison to mine more pairs iteratively. Results show that our approach can produce highly reliable name pairs. We also duplicate two state-of-the-art name translation mining methods and use two existing name translation gazetteers to compare with our approach. Comparisons show our approach can effectively augment the results from each of these alternative methods and resources.
Original language | English (US) |
---|---|
Title of host publication | Building and Using Comparable Corpora |
Publisher | Springer |
Pages | 243-263 |
Number of pages | 21 |
ISBN (Electronic) | 9783642201288 |
ISBN (Print) | 9783642201271 |
DOIs | |
State | Published - Jan 1 2013 |
Externally published | Yes |
Keywords
- Cross-lingual comparable corpora
- Information network
- Namemining
ASJC Scopus subject areas
- General Computer Science