Decision boundary analysis of adversarial examples

Warren He, Bo Li, Dawn Song

Research output: Contribution to conferencePaperpeer-review


Deep neural networks (DNNs) are vulnerable to adversarial examples, which are carefully crafted instances aiming to cause prediction errors for DNNs. Recent research on adversarial examples has examined local neighborhoods in the input space of DNN models. However, previous work has limited what regions to consider, focusing either on low-dimensional subspaces or small balls. In this paper, we argue that information from larger neighborhoods, such as from more directions and from greater distances, will better characterize the relationship between adversarial examples and the DNN models. First, we introduce an attack, OPTMARGIN, which generates adversarial examples robust to small perturbations. These examples successfully evade a defense that only considers a small ball around an input instance. Second, we analyze a larger neighborhood around input instances by looking at properties of surrounding decision boundaries, namely the distances to the boundaries and the adjacent classes. We find that the boundaries around these adversarial examples do not resemble the boundaries around benign examples. Finally, we show that, under scrutiny of the surrounding decision boundaries, our OPTMARGIN examples do not convincingly mimic benign examples. Although our experiments are limited to a few specific attacks, we hope these findings will motivate new, more evasive attacks and ultimately, effective defenses.

Original languageEnglish (US)
StatePublished - 2018
Externally publishedYes
Event6th International Conference on Learning Representations, ICLR 2018 - Vancouver, Canada
Duration: Apr 30 2018May 3 2018


Conference6th International Conference on Learning Representations, ICLR 2018

ASJC Scopus subject areas

  • Language and Linguistics
  • Education
  • Computer Science Applications
  • Linguistics and Language


Dive into the research topics of 'Decision boundary analysis of adversarial examples'. Together they form a unique fingerprint.

Cite this