Joshua: An open source toolkit for parsing-based machine translation

Zhifei Li, Chris Callison-Burch, Chris Dyer, Juri Ganitkevitch, Sanjeev Khudanpur, Lane Schwartz, Wren N G Thornton, Jonathan Weese, Omar F. Zaidan

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We describe Joshua (Li et al., 2009a) 1, an open source toolkit for statistical machine translation. Joshua implements all of the algorithms required for translation via synchronous context free grammars (SCFGs): chart-parsing, n-gram language model integration, beam- and cubepruning, and k-best extraction. The toolkit also implements suffix-array grammar extraction and minimum error rate training. It uses parallel and distributed computing techniques for scalability. We also provide a demonstration outline for illustrating the toolkit's features to potential users, whether they be newcomers to the field or power users interested in extending the toolkit.

Original languageEnglish (US)
Title of host publicationEACL Fourth Workshop on Statistical Machine Translation
Pages25-28
Number of pages4
StatePublished - Mar 2009
Externally publishedYes
EventJoint Conference of the 47th Annual Meeting of the Association for Computational Linguistics and 4th International Joint Conference on Natural Language Processing of the AFNLP, ACL-IJCNLP 2009 - Suntec, Singapore
Duration: Aug 2 2009Aug 7 2009

Publication series

NameACL-IJCNLP 2009 - Joint Conf. of the 47th Annual Meeting of the Association for Computational Linguistics and 4th Int. Joint Conf. on Natural Language Processing of the AFNLP, Proceedings of the Conf.

Other

OtherJoint Conference of the 47th Annual Meeting of the Association for Computational Linguistics and 4th International Joint Conference on Natural Language Processing of the AFNLP, ACL-IJCNLP 2009
CountrySingapore
CitySuntec
Period8/2/098/7/09

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language

Fingerprint Dive into the research topics of 'Joshua: An open source toolkit for parsing-based machine translation'. Together they form a unique fingerprint.

Cite this