Abstract
Adversarial black-box attacks aim to craft adversarial perturbations by querying input–output pairs of machine learning models. They are widely used to evaluate the robustness of pre-trained models. However, black-box attacks often suffer from the issue of query inefficiency due to the high dimensionality of the input space, and therefore incur a false sense of model robustness. In this paper, we relax the conditions of the black-box threat model, and propose a novel technique called the spanning attack. By constraining adversarial perturbations in a low-dimensional subspace via spanning an auxiliary unlabeled dataset, the spanning attack significantly improves the query efficiency of a wide variety of existing black-box attacks. Extensive experiments show that the proposed method works favorably in both soft-label and hard-label black-box attacks.
Original language | English (US) |
---|---|
Pages (from-to) | 2349-2368 |
Number of pages | 20 |
Journal | Machine Learning |
Volume | 109 |
Issue number | 12 |
DOIs | |
State | Published - Dec 2020 |
Externally published | Yes |
Keywords
- Adversarial machine learning
- Adversarial robustness
- Black-box attacks
- Query efficiency
ASJC Scopus subject areas
- Software
- Artificial Intelligence