TY - GEN
T1 - ViP
T2 - 17th European Conference on Computer Vision, ECCV 2022
AU - Li, Junbo
AU - Zhang, Huan
AU - Xie, Cihang
N1 - Funding Information:
Acknowledgment. This work is supported by a gift from Open Philanthropy, TPU Research Cloud (TRC) program, and Google Cloud Research Credits program.
Publisher Copyright:
© 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2022
Y1 - 2022
N2 - Patch attack, which introduces a perceptible but localized change to the input image, has gained significant momentum in recent years. In this paper, we present a unified framework to analyze certified patch defense tasks, including both certified detection and certified recovery, leveraging the recently emerged Vision Transformers (ViTs). In addition to the existing patch defense setting where only one patch is considered, we provide the very first study on developing certified detection against the dual patch attack, in which the attacker is allowed to adversarially manipulate pixels in two different regions. By building upon the latest progress in self-supervised ViTs with masked image modeling (i.e., masked autoencoder (MAE)), our method achieves state-of-the-art performance in both certified detection and certified recovery of adversarial patches. Regarding certified detection, we improve the performance by up to ∼ 16% on ImageNet without training on a single adversarial patch, and for the first time, can also tackle the more challenging dual patch setting. Our method largely closes the gap between detection-based certified robustness and clean image accuracy. Regarding certified recovery, our approach improves certified accuracy by ∼ 2% on ImageNet across all attack sizes, attaining the new state-of-the-art performance.
AB - Patch attack, which introduces a perceptible but localized change to the input image, has gained significant momentum in recent years. In this paper, we present a unified framework to analyze certified patch defense tasks, including both certified detection and certified recovery, leveraging the recently emerged Vision Transformers (ViTs). In addition to the existing patch defense setting where only one patch is considered, we provide the very first study on developing certified detection against the dual patch attack, in which the attacker is allowed to adversarially manipulate pixels in two different regions. By building upon the latest progress in self-supervised ViTs with masked image modeling (i.e., masked autoencoder (MAE)), our method achieves state-of-the-art performance in both certified detection and certified recovery of adversarial patches. Regarding certified detection, we improve the performance by up to ∼ 16% on ImageNet without training on a single adversarial patch, and for the first time, can also tackle the more challenging dual patch setting. Our method largely closes the gap between detection-based certified robustness and clean image accuracy. Regarding certified recovery, our approach improves certified accuracy by ∼ 2% on ImageNet across all attack sizes, attaining the new state-of-the-art performance.
KW - Certified defense
KW - Patch attacks
KW - Vision transformer
UR - http://www.scopus.com/inward/record.url?scp=85142754294&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85142754294&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-19806-9_33
DO - 10.1007/978-3-031-19806-9_33
M3 - Conference contribution
AN - SCOPUS:85142754294
SN - 9783031198052
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 573
EP - 587
BT - Computer Vision – ECCV 2022 - 17th European Conference, Proceedings
A2 - Avidan, Shai
A2 - Brostow, Gabriel
A2 - Cissé, Moustapha
A2 - Farinella, Giovanni Maria
A2 - Hassner, Tal
PB - Springer
Y2 - 23 October 2022 through 27 October 2022
ER -