Abstract
Information distillation aims to analyze and interpret large volumes of speech and text archives in multiple languages and produce structured information of interest to the user. In this work, we investigate cross-lingual information distillation, where non-English (source language) documents are searched for user queries that are in English (target language). We propose to perform distillation both on the original source language data and their English translations output by machine translation, and combine the two outputs. We experimentally show that combination approach results in 8% to 16% absolute (13% to 31% relative) F-measure improvement over the previous work.
Original language | English (US) |
---|---|
Pages (from-to) | 2707-2710 |
Number of pages | 4 |
Journal | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
State | Published - 2008 |
Externally published | Yes |
Event | INTERSPEECH 2008 - 9th Annual Conference of the International Speech Communication Association - Brisbane, QLD, Australia Duration: Sep 22 2008 → Sep 26 2008 |
Keywords
- Classification model combination
- Cross-lingual processing
- Information distillation
- Sentence extraction
ASJC Scopus subject areas
- Human-Computer Interaction
- Signal Processing
- Software
- Sensory Systems