TY - CONF
T1 - Characterizing adversarial subspaces using local intrinsic dimensionality
AU - Ma, Xingjun
AU - Li, Bo
AU - Wang, Yisen
AU - Erfani, Sarah M.
AU - Wijewickrema, Sudanthi
AU - Schoenebeck, Grant
AU - Song, Dawn
AU - Houle, Michael E.
AU - Bailey, James
N1 - James Bailey is in part supported by the Australian Research Council via grant number DP170102472. Michael E. Houle is in part supported by JSPS Kakenhi Kiban (B) Research Grant 15H02753. Bo Li and Dawn Song are partially supported by Berkeley Deep Drive, the Center for Long-Term Cybersecurity, and FORCES (Foundations Of Resilient CybEr-Physical Systems), which receives support from the National Science Foundation (NSF award numbers CNS-1238959, CNS-1238962, CNS-1239054, CNS-1239166).
PY - 2018
Y1 - 2018
N2 - Deep Neural Networks (DNNs) have recently been shown to be vulnerable against adversarial examples, which are carefully crafted instances that can mislead DNNs to make errors during prediction. To better understand such attacks, a characterization is needed of the properties of regions (the so-called ‘adversarial subspaces’) in which adversarial examples lie. We tackle this challenge by characterizing the dimensional properties of adversarial regions, via the use of Local Intrinsic Dimensionality (LID). LID assesses the space-filling capability of the region surrounding a reference example, based on the distance distribution of the example to its neighbors. We first provide explanations about how adversarial perturbation can affect the LID characteristic of adversarial regions, and then show empirically that LID characteristics can facilitate the distinction of adversarial examples generated using state-of-the-art attacks. As a proof-of-concept, we show that a potential application of LID is to distinguish adversarial examples, and the preliminary results show that it can outperform several state-of-the-art detection measures by large margins for five attack strategies considered in this paper across three benchmark datasets . Our analysis of the LID characteristic for adversarial regions not only motivates new directions of effective adversarial defense, but also opens up more challenges for developing new attacks to better understand the vulnerabilities of DNNs.
AB - Deep Neural Networks (DNNs) have recently been shown to be vulnerable against adversarial examples, which are carefully crafted instances that can mislead DNNs to make errors during prediction. To better understand such attacks, a characterization is needed of the properties of regions (the so-called ‘adversarial subspaces’) in which adversarial examples lie. We tackle this challenge by characterizing the dimensional properties of adversarial regions, via the use of Local Intrinsic Dimensionality (LID). LID assesses the space-filling capability of the region surrounding a reference example, based on the distance distribution of the example to its neighbors. We first provide explanations about how adversarial perturbation can affect the LID characteristic of adversarial regions, and then show empirically that LID characteristics can facilitate the distinction of adversarial examples generated using state-of-the-art attacks. As a proof-of-concept, we show that a potential application of LID is to distinguish adversarial examples, and the preliminary results show that it can outperform several state-of-the-art detection measures by large margins for five attack strategies considered in this paper across three benchmark datasets . Our analysis of the LID characteristic for adversarial regions not only motivates new directions of effective adversarial defense, but also opens up more challenges for developing new attacks to better understand the vulnerabilities of DNNs.
UR - http://www.scopus.com/inward/record.url?scp=85083953489&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85083953489&partnerID=8YFLogxK
M3 - Paper
AN - SCOPUS:85083953489
T2 - 6th International Conference on Learning Representations, ICLR 2018
Y2 - 30 April 2018 through 3 May 2018
ER -