TY - GEN
T1 - ABC
T2 - 60th Annual Meeting of the Association for Computational Linguistics, ACL 2022
AU - Peng, Hao
AU - Kasai, Jungo
AU - Pappas, Nikolaos
AU - Yogatama, Dani
AU - Wu, Zhaofeng
AU - Kong, Lingpeng
AU - Schwartz, Roy
AU - Smith, Noah A.
N1 - Publisher Copyright:
© 2022 Association for Computational Linguistics.
PY - 2022
Y1 - 2022
N2 - Transformer architectures have achieved state-of-the-art results on a variety of natural language processing (NLP) tasks. However, their attention mechanism comes with a quadratic complexity in sequence lengths, making the computational overhead prohibitive, especially for long sequences. Attention context can be seen as a random-access memory with each token taking a slot. Under this perspective, the memory size grows linearly with the sequence length, and so does the overhead of reading from it. One way to improve the efficiency is to bound the memory size. We show that disparate approaches can be subsumed into one abstraction, attention with bounded-memory control (ABC), and they vary in their organization of the memory. ABC reveals new, unexplored possibilities. First, it connects several efficient attention variants that would otherwise seem distinct. Second, this abstraction gives new insights-an established approach (Wang et al., 2020b) previously thought to not be applicable in causal attention, actually is. Last, we present a new instance of ABC, which draws inspiration from existing ABC approaches, but replaces their heuristic memory-organizing functions with a learned, contextualized one. Our experiments on language modeling, machine translation, and masked language model finetuning show that our approach outperforms previous efficient attention models; compared to strong transformer baselines, it significantly improves the inference time and space efficiency with no or negligible accuracy loss.
AB - Transformer architectures have achieved state-of-the-art results on a variety of natural language processing (NLP) tasks. However, their attention mechanism comes with a quadratic complexity in sequence lengths, making the computational overhead prohibitive, especially for long sequences. Attention context can be seen as a random-access memory with each token taking a slot. Under this perspective, the memory size grows linearly with the sequence length, and so does the overhead of reading from it. One way to improve the efficiency is to bound the memory size. We show that disparate approaches can be subsumed into one abstraction, attention with bounded-memory control (ABC), and they vary in their organization of the memory. ABC reveals new, unexplored possibilities. First, it connects several efficient attention variants that would otherwise seem distinct. Second, this abstraction gives new insights-an established approach (Wang et al., 2020b) previously thought to not be applicable in causal attention, actually is. Last, we present a new instance of ABC, which draws inspiration from existing ABC approaches, but replaces their heuristic memory-organizing functions with a learned, contextualized one. Our experiments on language modeling, machine translation, and masked language model finetuning show that our approach outperforms previous efficient attention models; compared to strong transformer baselines, it significantly improves the inference time and space efficiency with no or negligible accuracy loss.
UR - http://www.scopus.com/inward/record.url?scp=85146908066&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85146908066&partnerID=8YFLogxK
U2 - 10.18653/v1/2022.acl-long.515
DO - 10.18653/v1/2022.acl-long.515
M3 - Conference contribution
AN - SCOPUS:85146908066
T3 - Proceedings of the Annual Meeting of the Association for Computational Linguistics
SP - 7469
EP - 7483
BT - ACL 2022 - 60th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers)
A2 - Muresan, Smaranda
A2 - Nakov, Preslav
A2 - Villavicencio, Aline
PB - Association for Computational Linguistics (ACL)
Y2 - 22 May 2022 through 27 May 2022
ER -