Information network construction and alignment from automatically acquired comparable corpora

Heng Ji, Adam Lee, Wen Pin Lin

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

In this paper we describe a novel approach to discover cross-lingual comparable corpora based on video comparison. Then we propose a new task to extract and align information networks from comparable corpora. As a case study we demonstrate the effectiveness of utilizing bi-lingual information networks, wepresent a weakly-supervised and language-independent approach to mine name translation pairs. Based on the fact that some certain types of expressions are written in language independent forms, we generate seed pairs automatically. Starting from these seeds, we then apply a bootstrapping algorithm based on link comparison to mine more pairs iteratively. Results show that our approach can produce highly reliable name pairs. We also duplicate two state-of-the-art name translation mining methods and use two existing name translation gazetteers to compare with our approach. Comparisons show our approach can effectively augment the results from each of these alternative methods and resources.

Original languageEnglish (US)
Title of host publicationBuilding and Using Comparable Corpora
PublisherSpringer
Pages243-263
Number of pages21
ISBN (Electronic)9783642201288
ISBN (Print)9783642201271
DOIs
StatePublished - Jan 1 2013
Externally publishedYes

Keywords

  • Cross-lingual comparable corpora
  • Information network
  • Namemining

ASJC Scopus subject areas

  • General Computer Science

Fingerprint

Dive into the research topics of 'Information network construction and alignment from automatically acquired comparable corpora'. Together they form a unique fingerprint.

Cite this