Information network construction and alignment from automatically acquired comparable corpora

Heng Ji, Adam Lee, Wen Pin Lin

In this paper we describe a novel approach to discover cross-lingual comparable corpora based on video comparison. Then we propose a new task to extract and align information networks from comparable corpora. As a case study we demonstrate the effectiveness of utilizing bi-lingual information networks, wepresent a weakly-supervised and language-independent approach to mine name translation pairs. Based on the fact that some certain types of expressions are written in language independent forms, we generate seed pairs automatically. Starting from these seeds, we then apply a bootstrapping algorithm based on link comparison to mine more pairs iteratively. Results show that our approach can produce highly reliable name pairs. We also duplicate two state-of-the-art name translation mining methods and use two existing name translation gazetteers to compare with our approach. Comparisons show our approach can effectively augment the results from each of these alternative methods and resources.

Original languageEnglish (US)
Title of host publicationBuilding and Using Comparable Corpora
Number of pages21
ISBN (Electronic)9783642201288
ISBN (Print)9783642201271
StatePublished - Jan 1 2013
Externally publishedYes


  • Cross-lingual comparable corpora
  • Information network
  • Namemining

ASJC Scopus subject areas

  • General Computer Science


