TY - GEN
T1 - Illinois Japanese ↔ English News Translation for WMT 2021
AU - Le, Giang
AU - Mori, Shinka
AU - Schwartz, Lane
N1 - Publisher Copyright:
© 2021 Association for Computational Linguistics
PY - 2021
Y1 - 2021
N2 - This system paper describes an end-to-end NMT pipeline for the Japanese ↔ English news translation task as submitted to WMT 2021, where we explore the efficacy of techniques such as tokenizing with language-independent and language-dependent tokenizers, normalizing by orthographic conversion, creating a politeness-and-formality-aware model by implementing a tagger, back-translation, model ensembling, and n-best reranking. We use parallel corpora provided by WMT 2021 organizers for training, and development and test data from WMT 2020 for evaluation of different experiment models. The preprocessed corpora are trained with a Transformer neural network model. We found that combining various techniques described herein, such as language-independent BPE tokenization, incorporating politeness and formality tags, model ensembling, n-best reranking, and back-translation produced the best translation models relative to other experiment systems.
AB - This system paper describes an end-to-end NMT pipeline for the Japanese ↔ English news translation task as submitted to WMT 2021, where we explore the efficacy of techniques such as tokenizing with language-independent and language-dependent tokenizers, normalizing by orthographic conversion, creating a politeness-and-formality-aware model by implementing a tagger, back-translation, model ensembling, and n-best reranking. We use parallel corpora provided by WMT 2021 organizers for training, and development and test data from WMT 2020 for evaluation of different experiment models. The preprocessed corpora are trained with a Transformer neural network model. We found that combining various techniques described herein, such as language-independent BPE tokenization, incorporating politeness and formality tags, model ensembling, n-best reranking, and back-translation produced the best translation models relative to other experiment systems.
UR - http://www.scopus.com/inward/record.url?scp=85127139433&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85127139433&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85127139433
T3 - WMT 2021 - 6th Conference on Machine Translation, Proceedings
SP - 144
EP - 153
BT - WMT 2021 - 6th Conference on Machine Translation, Proceedings
PB - Association for Computational Linguistics (ACL)
T2 - 6th Conference on Machine Translation, WMT 2021
Y2 - 10 November 2021 through 11 November 2021
ER -