Abstract
Transformer-based models are not efficient in processing long sequences due to the quadratic space and time complexity of the self-attention modules. To address this limitation, Linformer and Informer reduce the quadratic complexity to linear (modulo logarithmic factors) via low-dimensional projection and row selection, respectively. These two models are intrinsically connected, and to understand their connection we introduce a theoretical framework of matrix sketching. Based on the theoretical analysis, we propose Skeinformer to accelerate self-attention and further improve the accuracy of matrix approximation to self-attention with column sampling, adaptive row normalization and pilot sampling reutilization. Experiments on the Long Range Arena benchmark demonstrate that our methods outperform alternatives with a consistently smaller time/space footprint.
Original language | English (US) |
---|---|
Title of host publication | Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics |
Subtitle of host publication | Human Language Technologies |
Editors | Marine Carpuat, Marie-Catherine de Marneffe, Ivan Vladimir Meza Ruiz |
Place of Publication | Seattle |
Publisher | Association for Computational Linguistics |
Pages | 5187-5199 |
Number of pages | 13 |
ISBN (Electronic) | 9781955917711 |
DOIs | |
State | Published - 2022 |
Event | 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2022 - Seattle, United States Duration: Jul 10 2022 → Jul 15 2022 |
Conference
Conference | 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2022 |
---|---|
Country/Territory | United States |
City | Seattle |
Period | 7/10/22 → 7/15/22 |
ASJC Scopus subject areas
- Computer Networks and Communications
- Hardware and Architecture
- Information Systems
- Software