TY - GEN
T1 - Practical black-box attacks on deep neural networks using efficient query mechanisms
AU - Bhagoji, Arjun Nitin
AU - He, Warren
AU - Li, Bo
AU - Song, Dawn
N1 - Publisher Copyright:
© Springer Nature Switzerland AG 2018.
PY - 2018
Y1 - 2018
N2 - Existing black-box attacks on deep neural networks (DNNs) have largely focused on transferability, where an adversarial instance generated for a locally trained model can “transfer” to attack other learning models. In this paper, we propose novel Gradient Estimation black-box attacks for adversaries with query access to the target model’s class probabilities, which do not rely on transferability. We also propose strategies to decouple the number of queries required to generate each adversarial sample from the dimensionality of the input. An iterative variant of our attack achieves close to 100% attack success rates for both targeted and untargeted attacks on DNNs. We carry out a thorough comparative evaluation of black-box attacks and show that Gradient Estimation attacks achieve attack success rates similar to state-of-the-art white-box attacks on the MNIST and CIFAR-10 datasets. We also apply the Gradient Estimation attacks successfully against real-world classifiers hosted by Clarifai. Further, we evaluate black-box attacks against state-of-the-art defenses based on adversarial training and show that the Gradient Estimation attacks are very effective even against these defenses.
AB - Existing black-box attacks on deep neural networks (DNNs) have largely focused on transferability, where an adversarial instance generated for a locally trained model can “transfer” to attack other learning models. In this paper, we propose novel Gradient Estimation black-box attacks for adversaries with query access to the target model’s class probabilities, which do not rely on transferability. We also propose strategies to decouple the number of queries required to generate each adversarial sample from the dimensionality of the input. An iterative variant of our attack achieves close to 100% attack success rates for both targeted and untargeted attacks on DNNs. We carry out a thorough comparative evaluation of black-box attacks and show that Gradient Estimation attacks achieve attack success rates similar to state-of-the-art white-box attacks on the MNIST and CIFAR-10 datasets. We also apply the Gradient Estimation attacks successfully against real-world classifiers hosted by Clarifai. Further, we evaluate black-box attacks against state-of-the-art defenses based on adversarial training and show that the Gradient Estimation attacks are very effective even against these defenses.
KW - Adversarial examples
KW - Black-box attacks
KW - Deep neural networks
KW - Image classification
UR - http://www.scopus.com/inward/record.url?scp=85055098585&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85055098585&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-01258-8_10
DO - 10.1007/978-3-030-01258-8_10
M3 - Conference contribution
AN - SCOPUS:85055098585
SN - 9783030012571
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 158
EP - 174
BT - Computer Vision – ECCV 2018 - 15th European Conference, 2018, Proceedings
A2 - Hebert, Martial
A2 - Ferrari, Vittorio
A2 - Sminchisescu, Cristian
A2 - Weiss, Yair
PB - Springer
T2 - 15th European Conference on Computer Vision, ECCV 2018
Y2 - 8 September 2018 through 14 September 2018
ER -