Segmentation and disfluency removal for conversational speech translation

Hany Hassan, Lee Schwartz, Dilek Hakkani-Tür, Gokhan Tur

Research output: Contribution to journalConference articlepeer-review

Abstract

In this paper we focus on the effect of on-line speech segmentation and disfluency removal methods on conversational speech translation. In a real-time conversational speech to speech translation system, on-line segmentation of speech is required to avoid latency beyond few seconds. While sentential unit segmentation and disfluency removal have been heavily studied mainly for off-line speech processing, to the best of our knowledge, the combined effect of these tasks on conversational speech translation has not been investigated. Furthermore, optimization of performance given maximum allowable system latency to enable a conversation is a newer problem for these tasks. We show that the conventional assumption of doing segmentation followed by disfluency removal is not the best practice. We propose a new approach to do simple-disfluency removal followed by segmentation and then by complex-disfluency removal. The proposed approach shows a significant gain on translation performance of up to 3 Bleu points with only 6 second latency to look ahead, using state-ofthe art machine translation and speech recognition systems.

Original languageEnglish (US)
Pages (from-to)318-322
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
StatePublished - 2014
Externally publishedYes
Event15th Annual Conference of the International Speech Communication Association: Celebrating the Diversity of Spoken Languages, INTERSPEECH 2014 - Singapore, Singapore
Duration: Sep 14 2014Sep 18 2014

Keywords

  • Disfluency removal
  • Segmentation
  • Sentence units
  • Speech processing
  • Speech translation

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modeling and Simulation

Fingerprint

Dive into the research topics of 'Segmentation and disfluency removal for conversational speech translation'. Together they form a unique fingerprint.

Cite this