PROVABLY ROBUST ADVERSARIAL EXAMPLES

Dimitar I. Dimitrov, Gagandeep Singh, Timon Gehr, Martin Vechev

Research output: Contribution to conferencePaperpeer-review

Abstract

We introduce the concept of provably robust adversarial examples for deep neural networks - connected input regions constructed from standard adversarial examples which are guaranteed to be robust to a set of real-world perturbations (such as changes in pixel intensity and geometric transformations). We present a novel method called PARADE for generating these regions in a scalable manner which works by iteratively refining the region initially obtained via sampling until a refined region is certified to be adversarial with existing state-of-the-art verifiers. At each step, a novel optimization procedure is applied to maximize the region's volume under the constraint that the convex relaxation of the network behavior with respect to the region implies a chosen bound on the certification objective. Our experimental evaluation shows the effectiveness of PARADE: it successfully finds large provably robust regions including ones containing ≈ 10573 adversarial examples for pixel intensity and ≈ 10599 for geometric perturbations. The provability enables our robust examples to be significantly more effective against state-of-the-art defenses based on randomized smoothing than the individual attacks used to construct the regions.

Original languageEnglish (US)
StatePublished - 2022
Event10th International Conference on Learning Representations, ICLR 2022 - Virtual, Online
Duration: Apr 25 2022Apr 29 2022

Conference

Conference10th International Conference on Learning Representations, ICLR 2022
CityVirtual, Online
Period4/25/224/29/22

ASJC Scopus subject areas

  • Language and Linguistics
  • Computer Science Applications
  • Education
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'PROVABLY ROBUST ADVERSARIAL EXAMPLES'. Together they form a unique fingerprint.

Cite this