An online relevant set algorithm for statistical machine translation

Christoph Tillmann, Tong Zhang

Research output: Contribution to journalArticlepeer-review

Abstract

This paper presents a novel online relevant set algorithm for a linearly scored block sequence translation model. The key component is a new procedure to directly optimize the global scoring function used by a statistical machine translation (SMT) decoder. This training procedure treats the decoder as a black-box, and thus can be used to optimize any decoding scheme. The novel algorithm is evaluated using different feature types: 1) commonly used probabilistic features, such as translation, language, or distortion model probabilities, and 2) binary features. In particular, encouraging results on a standard Arabic-English translation task are presented for a translation system that uses only binary feature functions. To further demonstrate the effectiveness of the novel training algorithm, a detailed comparison with the widely used minimum-error-rate (MER) training algorithm is presented using the same decoder and feature set. The online algorithm is simplified by introducing so-called "seed" block sequences which enable the training to be carried out without a gold standard block translation. While the online training algorithm is extremely fast, it also improves translation scores over the MER algorithm in some experiments.

Original languageEnglish (US)
Pages (from-to)1274-1286
Number of pages13
JournalIEEE Transactions on Audio, Speech and Language Processing
Volume16
Issue number7
DOIs
StatePublished - Sep 2008
Externally publishedYes

Keywords

  • Discriminative learning
  • Online algorithm
  • Statistical machine translation

ASJC Scopus subject areas

  • Acoustics and Ultrasonics
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'An online relevant set algorithm for statistical machine translation'. Together they form a unique fingerprint.

Cite this