TY - GEN
T1 - Improving speech translation with automatic boundary prediction
AU - Matusov, Evgeny
AU - Hillard, Dustin
AU - Magimai-Doss, Mathew
AU - Hakkani-Tur, Dilek
AU - Ostendorf, Mari
AU - Ney, Hermann
PY - 2007
Y1 - 2007
N2 - This paper investigates the influence of automatic sentence boundary and sub-sentence punctuation prediction on machine translation (MT) of automatically recognized speech. We use prosodic and lexical cues to determine sentence boundaries, and successfully combine two complementary approaches to sentence boundary prediction. We also introduce a new feature for segmentation prediction that directly considers the assumptions of the phrase translation model. In addition, we show how automatically predicted commas can be used to constrain reordering in MT search. We evaluate the presented methods using a state-of-the-art phrase-based statistical MT system on two large vocabulary tasks. We find that careful optimization of the segmentation parameters directly for translation quality improves the translation results in comparison to independent optimization for segmentation quality of the predicted source language sentence boundaries.
AB - This paper investigates the influence of automatic sentence boundary and sub-sentence punctuation prediction on machine translation (MT) of automatically recognized speech. We use prosodic and lexical cues to determine sentence boundaries, and successfully combine two complementary approaches to sentence boundary prediction. We also introduce a new feature for segmentation prediction that directly considers the assumptions of the phrase translation model. In addition, we show how automatically predicted commas can be used to constrain reordering in MT search. We evaluate the presented methods using a state-of-the-art phrase-based statistical MT system on two large vocabulary tasks. We find that careful optimization of the segmentation parameters directly for translation quality improves the translation results in comparison to independent optimization for segmentation quality of the predicted source language sentence boundaries.
UR - http://www.scopus.com/inward/record.url?scp=56149108304&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=56149108304&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:56149108304
SN - 9781605603162
T3 - International Speech Communication Association - 8th Annual Conference of the International Speech Communication Association, Interspeech 2007
SP - 2448
EP - 2451
BT - International Speech Communication Association - 8th Annual Conference of the International Speech Communication Association, Interspeech 2007
T2 - 8th Annual Conference of the International Speech Communication Association, Interspeech 2007
Y2 - 27 August 2007 through 31 August 2007
ER -