TY - GEN
T1 - On learning language-invariant representations for universal machine translation
AU - Zhao, Han
AU - Hu, Junjie
AU - Risteski, Andrej
N1 - Funding Information:
We would like to thank Tom Mitchell for helpful conversations in the initial stages of the project, and Jiatao Gu for useful discussions on the recent progress in universal machine translation. HZ would like to acknowledge support from the DARPA XAI project, contract #FA87501720152 and NVIDIA’s GPU grant. JH is sponsored by the Air Force Research Laboratory under agreement number FA8750-19-2-0200.
Publisher Copyright:
© 2020 by the Authors All rights reserved.
PY - 2020
Y1 - 2020
N2 - The goal of universal machine translation is to learn to translate between any pair of languages. Despite impressive empirical results and an increasing interest in massively multilingual models, theoretical analysis on translation errors made by such universal machine translation models is only nascent. In this paper, we formally prove certain impossibilities of this endeavour in general, as well as prove positive results in the presence of additional (but natural) structure of data. For the former, we derive a lower bound on the translation error in the many-To-many translation setting, which shows that any algorithm aiming to learn shared sentence representations among multiple language pairs has to make a large translation error on at least one of the translation tasks, if no assumption on the structure of the languages is made. For the latter, we show that if the paired documents in the corpus follow a natural encoderdecoder generative process, we can expect a natural notion of "generalization": A linear number of language pairs, rather than quadratic, suffices to learn a good representation. Our theory also explains what kinds of connection graphs between pairs of languages are better suited: ones with longer paths result in worse sample complexity. We believe our theoretical insights and implications contribute to the future algorithmic design of universal machine translation.
AB - The goal of universal machine translation is to learn to translate between any pair of languages. Despite impressive empirical results and an increasing interest in massively multilingual models, theoretical analysis on translation errors made by such universal machine translation models is only nascent. In this paper, we formally prove certain impossibilities of this endeavour in general, as well as prove positive results in the presence of additional (but natural) structure of data. For the former, we derive a lower bound on the translation error in the many-To-many translation setting, which shows that any algorithm aiming to learn shared sentence representations among multiple language pairs has to make a large translation error on at least one of the translation tasks, if no assumption on the structure of the languages is made. For the latter, we show that if the paired documents in the corpus follow a natural encoderdecoder generative process, we can expect a natural notion of "generalization": A linear number of language pairs, rather than quadratic, suffices to learn a good representation. Our theory also explains what kinds of connection graphs between pairs of languages are better suited: ones with longer paths result in worse sample complexity. We believe our theoretical insights and implications contribute to the future algorithmic design of universal machine translation.
UR - http://www.scopus.com/inward/record.url?scp=85104572002&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85104572002&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85104572002
T3 - 37th International Conference on Machine Learning, ICML 2020
SP - 11289
EP - 11301
BT - 37th International Conference on Machine Learning, ICML 2020
A2 - Daume, Hal
A2 - Singh, Aarti
PB - International Machine Learning Society (IMLS)
T2 - 37th International Conference on Machine Learning, ICML 2020
Y2 - 13 July 2020 through 18 July 2020
ER -