TY - GEN
T1 - Extract, Denoise and Enforce
T2 - 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021
AU - Mao, Yuning
AU - Ma, Wenchang
AU - Lei, Deren
AU - Han, Jiawei
AU - Ren, Xiang
N1 - Funding Information:
We thank Yiqing Xie and anonymous reviewers for helpful feedback. Research was supported in part by US DARPA KAIROS Program No. FA8750-19-2-1004 and SocialSim Program No. W911NF-17-C-0099, National Science Foundation IIS-19-56151, IIS-17-41317, and IIS 17-04532, and the Molecule Maker Lab Institute: an AI Research Institutes program supported by NSF under Award No. 2019897. Any opinions, findings, and conclusions or recommendations expressed herein are those of the authors and do not necessarily represent the views, either expressed or implied, of DARPA or the U.S. Government.
Publisher Copyright:
© 2021 Association for Computational Linguistics
PY - 2021
Y1 - 2021
N2 - Prior studies on text-to-text generation typically assume that the model could figure out what to attend to in the input and what to include in the output via seq2seq learning, with only the parallel training data and no additional guidance. However, it remains unclear whether current models can preserve important concepts in the source input, as seq2seq learning does not have explicit focus on the concepts and commonly used evaluation metrics also treat concepts equally important as other tokens. In this paper, we present a systematic analysis that studies whether current seq2seq models, especially pre-trained language models, are good enough for preserving important input concepts and to what extent explicitly guiding generation with the concepts as lexical constraints is beneficial. We answer the above questions by conducting extensive analytical experiments on four representative text-to-text generation tasks. Based on the observations, we then propose a simple yet effective framework to automatically extract, denoise, and enforce important input concepts as lexical constraints. This new method performs comparably or better than its unconstrained counterpart on automatic metrics, demonstrates higher coverage for concept preservation, and receives better ratings in the human evaluation.
AB - Prior studies on text-to-text generation typically assume that the model could figure out what to attend to in the input and what to include in the output via seq2seq learning, with only the parallel training data and no additional guidance. However, it remains unclear whether current models can preserve important concepts in the source input, as seq2seq learning does not have explicit focus on the concepts and commonly used evaluation metrics also treat concepts equally important as other tokens. In this paper, we present a systematic analysis that studies whether current seq2seq models, especially pre-trained language models, are good enough for preserving important input concepts and to what extent explicitly guiding generation with the concepts as lexical constraints is beneficial. We answer the above questions by conducting extensive analytical experiments on four representative text-to-text generation tasks. Based on the observations, we then propose a simple yet effective framework to automatically extract, denoise, and enforce important input concepts as lexical constraints. This new method performs comparably or better than its unconstrained counterpart on automatic metrics, demonstrates higher coverage for concept preservation, and receives better ratings in the human evaluation.
UR - http://www.scopus.com/inward/record.url?scp=85127380776&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85127380776&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85127380776
T3 - EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings
SP - 5063
EP - 5074
BT - EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings
PB - Association for Computational Linguistics (ACL)
Y2 - 7 November 2021 through 11 November 2021
ER -