Abstract
In this paper we focus on the effect of on-line speech segmentation and disfluency removal methods on conversational speech translation. In a real-time conversational speech to speech translation system, on-line segmentation of speech is required to avoid latency beyond few seconds. While sentential unit segmentation and disfluency removal have been heavily studied mainly for off-line speech processing, to the best of our knowledge, the combined effect of these tasks on conversational speech translation has not been investigated. Furthermore, optimization of performance given maximum allowable system latency to enable a conversation is a newer problem for these tasks. We show that the conventional assumption of doing segmentation followed by disfluency removal is not the best practice. We propose a new approach to do simple-disfluency removal followed by segmentation and then by complex-disfluency removal. The proposed approach shows a significant gain on translation performance of up to 3 Bleu points with only 6 second latency to look ahead, using state-ofthe art machine translation and speech recognition systems.
Original language | English (US) |
---|---|
Pages (from-to) | 318-322 |
Number of pages | 5 |
Journal | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
State | Published - 2014 |
Externally published | Yes |
Event | 15th Annual Conference of the International Speech Communication Association: Celebrating the Diversity of Spoken Languages, INTERSPEECH 2014 - Singapore, Singapore Duration: Sep 14 2014 → Sep 18 2014 |
Keywords
- Disfluency removal
- Segmentation
- Sentence units
- Speech processing
- Speech translation
ASJC Scopus subject areas
- Language and Linguistics
- Human-Computer Interaction
- Signal Processing
- Software
- Modeling and Simulation