TY - JOUR
T1 - Detection of Adversarial Attacks via Disentangling Natural Images and Perturbations
AU - Qing, Yuanyuan
AU - Bai, Tao
AU - Liu, Zhuotao
AU - Moulin, Pierre
AU - Wen, Bihan
N1 - This work was supported in part by the Singapore Ministry of Education AcRF Tier 1 under Grant RG61/22 and Start-Up Grant.
PY - 2024
Y1 - 2024
N2 - The vulnerability of deep neural networks against adversarial attacks, i.e., imperceptible adversarial perturbations can easily give rise to wrong predictions, poses a huge threat to the security of their real-world deployments. In this paper, a novel Adversarial Detection method via Disentangling Natural images and Perturbations (ADDNP) is proposed. Compared to natural images that can typically be modeled by lower-dimensional subspaces or manifolds, the distributions of adversarial perturbations are much more complex, e.g., one normal example's adversarial counterparts generated by different attack strategies can be significantly distinct. The proposed ADDNP exploits such distinct properties for the detection of adversarial attacks amongst normal examples. Specifically, we use a dual-branch disentangling framework to encode natural images and perturbations of inputs separately, followed by joint reconstruction. During inference, the reconstruction discrepancy (RD) measured in the learned latent feature space is used as an indicator of adversarial perturbations. The proposed ADDNP algorithm is evaluated on three popular datasets, i.e., CIFAR-10, CIFAR-100, and mini ImageNet with increasing data complexity, across multiple popular attack strategies. Compared to the existing and state-of-the-art detection methods, ADDNP has demonstrated promising performance on adversarial detection, with significant improvements on more challenging datasets.
AB - The vulnerability of deep neural networks against adversarial attacks, i.e., imperceptible adversarial perturbations can easily give rise to wrong predictions, poses a huge threat to the security of their real-world deployments. In this paper, a novel Adversarial Detection method via Disentangling Natural images and Perturbations (ADDNP) is proposed. Compared to natural images that can typically be modeled by lower-dimensional subspaces or manifolds, the distributions of adversarial perturbations are much more complex, e.g., one normal example's adversarial counterparts generated by different attack strategies can be significantly distinct. The proposed ADDNP exploits such distinct properties for the detection of adversarial attacks amongst normal examples. Specifically, we use a dual-branch disentangling framework to encode natural images and perturbations of inputs separately, followed by joint reconstruction. During inference, the reconstruction discrepancy (RD) measured in the learned latent feature space is used as an indicator of adversarial perturbations. The proposed ADDNP algorithm is evaluated on three popular datasets, i.e., CIFAR-10, CIFAR-100, and mini ImageNet with increasing data complexity, across multiple popular attack strategies. Compared to the existing and state-of-the-art detection methods, ADDNP has demonstrated promising performance on adversarial detection, with significant improvements on more challenging datasets.
KW - Adversarial detection
KW - disentangled representation
KW - generative adversarial network
KW - representation learning
UR - http://www.scopus.com/inward/record.url?scp=85182921741&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85182921741&partnerID=8YFLogxK
U2 - 10.1109/TIFS.2024.3352837
DO - 10.1109/TIFS.2024.3352837
M3 - Article
AN - SCOPUS:85182921741
SN - 1556-6013
VL - 19
SP - 2814
EP - 2825
JO - IEEE Transactions on Information Forensics and Security
JF - IEEE Transactions on Information Forensics and Security
ER -