TY - GEN
T1 - A Framework for Bidirectional Decoding
T2 - 2023 Findings of the Association for Computational Linguistics: EMNLP 2023
AU - Canby, Marc E.
AU - Hockenmaier, Julia
N1 - Publisher Copyright:
© 2023 Association for Computational Linguistics.
PY - 2023
Y1 - 2023
N2 - Transformer-based encoder-decoder models that generate outputs in a left-to-right fashion have become standard for sequence-to-sequence tasks. In this paper, we propose a framework for decoding that produces sequences from the “outside-in”: at each step, the model chooses to generate a token on the left, on the right, or join the left and right sequences. We argue that this is more principled than prior bidirectional decoders. Our proposal supports a variety of model architectures and includes several training methods, such as a dynamic programming algorithm that marginalizes out the latent ordering variable. Our model sets state-of-the-art (SOTA) on the 2022 and 2023 shared tasks, beating the next best systems by over 4.7 and 2.7 points in average accuracy respectively. The model performs particularly well on long sequences, can implicitly learn the split point of words composed of stem and affix, and performs better relative to the baseline on datasets that have fewer unique lemmas (but more examples per lemma).
AB - Transformer-based encoder-decoder models that generate outputs in a left-to-right fashion have become standard for sequence-to-sequence tasks. In this paper, we propose a framework for decoding that produces sequences from the “outside-in”: at each step, the model chooses to generate a token on the left, on the right, or join the left and right sequences. We argue that this is more principled than prior bidirectional decoders. Our proposal supports a variety of model architectures and includes several training methods, such as a dynamic programming algorithm that marginalizes out the latent ordering variable. Our model sets state-of-the-art (SOTA) on the 2022 and 2023 shared tasks, beating the next best systems by over 4.7 and 2.7 points in average accuracy respectively. The model performs particularly well on long sequences, can implicitly learn the split point of words composed of stem and affix, and performs better relative to the baseline on datasets that have fewer unique lemmas (but more examples per lemma).
UR - http://www.scopus.com/inward/record.url?scp=85175398452&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85175398452&partnerID=8YFLogxK
U2 - 10.18653/v1/2023.findings-emnlp.297
DO - 10.18653/v1/2023.findings-emnlp.297
M3 - Conference contribution
AN - SCOPUS:85175398452
T3 - Findings of the Association for Computational Linguistics: EMNLP 2023
SP - 4485
EP - 4507
BT - Findings of the Association for Computational Linguistics
PB - Association for Computational Linguistics (ACL)
Y2 - 6 December 2023 through 10 December 2023
ER -