Sketching as a Tool for Understanding and Accelerating Self-attention for Long Sequences

Yifan Chen, Qi Zeng, Dilek Hakkani-Tur, Di Jin, Heng Ji, Yun Yang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Transformer-based models are not efficient in processing long sequences due to the quadratic space and time complexity of the self-attention modules. To address this limitation, Linformer and Informer reduce the quadratic complexity to linear (modulo logarithmic factors) via low-dimensional projection and row selection, respectively. These two models are intrinsically connected, and to understand their connection we introduce a theoretical framework of matrix sketching. Based on the theoretical analysis, we propose Skeinformer to accelerate self-attention and further improve the accuracy of matrix approximation to self-attention with column sampling, adaptive row normalization and pilot sampling reutilization. Experiments on the Long Range Arena benchmark demonstrate that our methods outperform alternatives with a consistently smaller time/space footprint.

Original languageEnglish (US)
Title of host publicationProceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics
Subtitle of host publicationHuman Language Technologies
EditorsMarine Carpuat, Marie-Catherine de Marneffe, Ivan Vladimir Meza Ruiz
Place of PublicationSeattle
PublisherAssociation for Computational Linguistics
Pages5187-5199
Number of pages13
ISBN (Electronic)9781955917711
DOIs
StatePublished - 2022
Event2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2022 - Seattle, United States
Duration: Jul 10 2022Jul 15 2022

Conference

Conference2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2022
Country/TerritoryUnited States
CitySeattle
Period7/10/227/15/22

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Hardware and Architecture
  • Information Systems
  • Software

Fingerprint

Dive into the research topics of 'Sketching as a Tool for Understanding and Accelerating Self-attention for Long Sequences'. Together they form a unique fingerprint.

Cite this