Fine-grained Local Sensitivity Analysis of Standard Dot-Product Self-Attention

Aaron Havens, Alexandre Araujo, Huan Zhang, Bin Hu

Research output: Contribution to journalConference articlepeer-review

Abstract

Self-attention has been widely used in various machine learning models, such as vision transformers. The standard dot-product self-attention is arguably the most popular structure, and there is a growing interest in understanding the mathematical properties of such attention mechanisms. This paper presents a fine-grained local sensitivity analysis of the standard dot-product self-attention, leading to new non-vacuous certified robustness results for vision transformers. Despite the well-known fact that dot-product self-attention is not (globally) Lipschitz, we develop new theoretical analysis of Local Fine-grained Attention Sensitivity (LoFAST) quantifying the effect of input feature perturbations on the attention output. Our analysis reveals that the local sensitivity of dot-product self-attention to ℓ2 perturbations can actually be controlled by several key quantities associated with the attention weight matrices and the unperturbed input. We empirically validate our theoretical findings by computing non-vacuous certified ℓ2-robustness for vision transformers on CIFAR-10 and SVHN datasets. The code for LoFAST is available at https://github.com/AaronHavens/LoFAST.

Original languageEnglish (US)
Pages (from-to)17680-17696
Number of pages17
JournalProceedings of Machine Learning Research
Volume235
StatePublished - 2024
Event41st International Conference on Machine Learning, ICML 2024 - Vienna, Austria
Duration: Jul 21 2024Jul 27 2024

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Statistics and Probability

Fingerprint

Dive into the research topics of 'Fine-grained Local Sensitivity Analysis of Standard Dot-Product Self-Attention'. Together they form a unique fingerprint.

Cite this