Cross-lingual sentence extraction for information distillation

Adish Kumar Singla, Dilek Hakkani-Tür

Research output: Contribution to journalConference articlepeer-review

Abstract

Information distillation aims to analyze and interpret large volumes of speech and text archives in multiple languages and produce structured information of interest to the user. In this work, we investigate cross-lingual information distillation, where non-English (source language) documents are searched for user queries that are in English (target language). We propose to perform distillation both on the original source language data and their English translations output by machine translation, and combine the two outputs. We experimentally show that combination approach results in 8% to 16% absolute (13% to 31% relative) F-measure improvement over the previous work.

Original languageEnglish (US)
Pages (from-to)2707-2710
Number of pages4
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
StatePublished - 2008
Externally publishedYes
EventINTERSPEECH 2008 - 9th Annual Conference of the International Speech Communication Association - Brisbane, QLD, Australia
Duration: Sep 22 2008Sep 26 2008

Keywords

  • Classification model combination
  • Cross-lingual processing
  • Information distillation
  • Sentence extraction

ASJC Scopus subject areas

  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Sensory Systems

Fingerprint

Dive into the research topics of 'Cross-lingual sentence extraction for information distillation'. Together they form a unique fingerprint.

Cite this