Cross-lingual Slot Filling from Comparable Corpora

Matthew Snover, Xiang Li, Wen Pin Lin, Zheng Chen, Suzanne Tamang, Mingmin Ge, Adam Lee, Qi Li, Hao Li, Sam Anzaroot, Heng Ji

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper introduces a new task of crosslingual slot filling which aims to discover attributes for entity queries from crosslingual comparable corpora and then present answers in a desired language. It is a very challenging task which suffers from both information extraction and machine translation errors. In this paper we analyze the types of errors produced by five different baseline approaches, and present a novel supervised rescoring based validation approach to incorporate global evidence from very large bilingual comparable corpora. Without using any additional labeled data this new approach obtained 38.5% relative improvement in Precision and 86.7% relative improvement in Recall over several state-of-the-art approaches. The ultimate system outperformed monolingual slot filling pipelines built on much larger monolingual corpora.

Original languageEnglish (US)
Title of host publication4th Workshop on Building and Using Comparable Corpora
Subtitle of host publicationComparable Corpora and the Web, BUCC 2011 at the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, ACL-HLT 2011 - Proceedings
EditorsPierre Zweigenbaum, Reinhard Rapp, Reinhard Rapp, Serge Sharoff
PublisherAssociation for Computational Linguistics (ACL)
Pages110-119
Number of pages10
ISBN (Electronic)9781937284015
StatePublished - 2011
Externally publishedYes
Event4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web, BUCC 2011 at the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, ACL-HLT 2011 - Portland, United States
Duration: Jun 24 2011 → …

Publication series

NameProceedings of the Annual Meeting of the Association for Computational Linguistics
ISSN (Print)0736-587X

Conference

Conference4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web, BUCC 2011 at the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, ACL-HLT 2011
Country/TerritoryUnited States
CityPortland
Period6/24/11 → …

ASJC Scopus subject areas

  • Computer Science Applications
  • Linguistics and Language
  • Language and Linguistics

Fingerprint

Dive into the research topics of 'Cross-lingual Slot Filling from Comparable Corpora'. Together they form a unique fingerprint.

Cite this