Statistical sentence extraction for information distillation

Dilek Hakkani-Tür, Gokhan Tur

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Information distillation aims to extract the most useful pieces of information related to a given query from massive, possibly multilingual, audio and textual document sources. One critical component in a distillation engine is detecting sentences to be extracted from each relevant document. In this paper, we presenta statistical sentenceextraction approach for distillation. Basically, we frame this task as a classification problem, where each candidate sentence in documents is classified as relevant to the query or not. These documents may be in textual or audio format and in a number of languages. For audio documents, we use both manual and automatic transcriptions, for non-English documents, we use automatic translations. In this work, we use AdaBoost, a discriminative classification method with both lexical and semantic features. The results indicate 11%-13% relative improvement over a baseline key word-spotting-based approach. We also show the robustness of our method on the audio subset of the document sources using manual and automatic transcriptions.

Original languageEnglish (US)
Title of host publication2007 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '07
PagesIV1-IV4
DOIs
StatePublished - 2007
Externally publishedYes
Event2007 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '07 - Honolulu, HI, United States
Duration: Apr 15 2007Apr 20 2007

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume4
ISSN (Print)1520-6149

Other

Other2007 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '07
Country/TerritoryUnited States
CityHonolulu, HI
Period4/15/074/20/07

Keywords

  • Information distillation
  • Information extraction
  • Language understanding
  • Natural language processing
  • Speech processing

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Statistical sentence extraction for information distillation'. Together they form a unique fingerprint.

Cite this