TY - GEN
T1 - Learning from mistakes
T2 - 21st Annual Meeting of the Special Interest Group on Discourse and Dialogue, SIGDIAL 2020
AU - Reed, Lena
AU - Harrison, Vrindavan
AU - Oraby, Shereen
AU - Hakkani-Tür, Dilek
AU - Walker, Marilyn
N1 - Publisher Copyright:
© 2020 Association for Computational Linguistics.
PY - 2020
Y1 - 2020
N2 - Natural language generators (NLGs) for taskoriented dialogue typically take a meaning representation (MR) as input, and are trained endto- end with a corpus of MR/utterance pairs, where the MRs cover a specific set of dialogue acts and domain attributes. Creation of such datasets is labor intensive and time consuming. Therefore, dialogue systems for new domain ontologies would benefit from using data for pre-existing ontologies. Here we explore, for the first time, whether it is possible to train an NLG for a new larger ontology using existing training sets for the restaurant domain, where each set is based on a different ontology. We create a new, larger combined ontology, and then train an NLG to produce utterances covering it. For example, if one dataset has attributes for family friendly and rating information, and the other has attributes for decor and service, our aim is an NLG for the combined ontology that can produce utterances that realize values for family friendly, rating, decor and service. Initial experiments with a baseline neural sequence-to-sequence model show that this task is surprisingly challenging. We then develop a novel self-training method that identifies (errorful) model outputs, automatically constructs a corrected MR input to form a new (MR, utterance) training pair, and then repeatedly adds these new instances back into the training data. We then test the resulting model on a new test set. The result is a selftrained model whose performance is an absolute 75.4% improvement over the baseline model. We also report a human qualitative evaluation of the final model showing that it achieves high naturalness, semantic coherence and grammaticality.
AB - Natural language generators (NLGs) for taskoriented dialogue typically take a meaning representation (MR) as input, and are trained endto- end with a corpus of MR/utterance pairs, where the MRs cover a specific set of dialogue acts and domain attributes. Creation of such datasets is labor intensive and time consuming. Therefore, dialogue systems for new domain ontologies would benefit from using data for pre-existing ontologies. Here we explore, for the first time, whether it is possible to train an NLG for a new larger ontology using existing training sets for the restaurant domain, where each set is based on a different ontology. We create a new, larger combined ontology, and then train an NLG to produce utterances covering it. For example, if one dataset has attributes for family friendly and rating information, and the other has attributes for decor and service, our aim is an NLG for the combined ontology that can produce utterances that realize values for family friendly, rating, decor and service. Initial experiments with a baseline neural sequence-to-sequence model show that this task is surprisingly challenging. We then develop a novel self-training method that identifies (errorful) model outputs, automatically constructs a corrected MR input to form a new (MR, utterance) training pair, and then repeatedly adds these new instances back into the training data. We then test the resulting model on a new test set. The result is a selftrained model whose performance is an absolute 75.4% improvement over the baseline model. We also report a human qualitative evaluation of the final model showing that it achieves high naturalness, semantic coherence and grammaticality.
UR - http://www.scopus.com/inward/record.url?scp=85098279658&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85098279658&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85098279658
T3 - SIGDIAL 2020 - 21st Annual Meeting of the Special Interest Group on Discourse and Dialogue, Proceedings of the Conference
SP - 21
EP - 34
BT - SIGDIAL 2020 - 21st Annual Meeting of the Special Interest Group on Discourse and Dialogue, Proceedings of the Conference
PB - Association for Computational Linguistics (ACL)
Y2 - 1 July 2020 through 3 July 2020
ER -