TY - GEN
T1 - Detecting Failures of Neural Machine Translation in the Absence of Reference Translations
AU - Wang, Wenyu
AU - Zheng, Wujie
AU - Liu, Dian
AU - Zhang, Changrong
AU - Zeng, Qinsong
AU - Deng, Yuetang
AU - Yang, Wei
AU - He, Pinjia
AU - Xie, Tao
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/6
Y1 - 2019/6
N2 - Despite getting widely adopted recently, a Neural Machine Translation (NMT) system is often found to produce translation failures in the outputs. Developers have been relying on in-house system testing for quality assurance of NMT. This testing methodology requires human-constructed reference translations as the ground truth (test oracle) for example natural language inputs. The testing methodology has shown benefits of quickly enhancing an NMT system in early development stages. However, in industrial settings, it is desirable to detect translation failures without reliance on reference translations for enabling further improvements on translation quality in both industrial development and production environments. Aiming for a practical and scalable solution to such demand in the industrial settings, in this paper, we propose a new approach for automatically identifying translation failures without requiring reference translations for a translation task. Our approach focuses on a property of natural language translation that can be checked systematically by using information from both the test inputs (i.e., the texts to be translated) and the test outputs (i.e., the translations under inspection) of the NMT system. Our evaluation conducted on real-world datasets shows that our approach can effectively detect property violations as translation failures. By deploying our approach in the translation service of WeChat (a messenger app with more than one billion monthly active users), we show that our approach is both practical and scalable in the industrial settings.
AB - Despite getting widely adopted recently, a Neural Machine Translation (NMT) system is often found to produce translation failures in the outputs. Developers have been relying on in-house system testing for quality assurance of NMT. This testing methodology requires human-constructed reference translations as the ground truth (test oracle) for example natural language inputs. The testing methodology has shown benefits of quickly enhancing an NMT system in early development stages. However, in industrial settings, it is desirable to detect translation failures without reliance on reference translations for enabling further improvements on translation quality in both industrial development and production environments. Aiming for a practical and scalable solution to such demand in the industrial settings, in this paper, we propose a new approach for automatically identifying translation failures without requiring reference translations for a translation task. Our approach focuses on a property of natural language translation that can be checked systematically by using information from both the test inputs (i.e., the texts to be translated) and the test outputs (i.e., the translations under inspection) of the NMT system. Our evaluation conducted on real-world datasets shows that our approach can effectively detect property violations as translation failures. By deploying our approach in the translation service of WeChat (a messenger app with more than one billion monthly active users), we show that our approach is both practical and scalable in the industrial settings.
KW - ML quality assurance
KW - failure detection
KW - neural machine translation
UR - http://www.scopus.com/inward/record.url?scp=85072180177&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85072180177&partnerID=8YFLogxK
U2 - 10.1109/DSN-Industry.2019.00007
DO - 10.1109/DSN-Industry.2019.00007
M3 - Conference contribution
AN - SCOPUS:85072180177
T3 - Proceedings - 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks - DSN 2019 Industry Track
SP - 1
EP - 4
BT - Proceedings - 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks - DSN 2019 Industry Track
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks - Industry Track, DSN-Industry Track 2019
Y2 - 24 June 2019 through 27 June 2019
ER -