AdvIT: Adversarial frames identifier based on temporal consistency in videos

Chaowei Xiao, Ruizhi Deng, Bo Li, Taesung Lee, Benjamin Edwards, Jinfeng Yi, Dawn Song, Mingyan Liu, Ian Molloy

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Deep neural networks (DNNs) have been widely applied in various applications, including autonomous driving and surveillance systems. However, DNNs are found to be vulnerable to adversarial examples, which are carefully crafted inputs aiming to mislead a learner to make incorrect predictions. While several defense and detection approaches are proposed for static image classification, many security-critical tasks use videos as their input and require efficient processing. In this paper, we propose an efficient and effective method advIT to detect adversarial frames within videos against different types of attacks based on temporal consistency property of videos. In particular, we apply optical flow estimation to the target and previous frames to generate pseudo frames and evaluate the consistency of the learner output between these pseudo frames and target. High inconsistency indicates that the target frame is adversarial. We conduct extensive experiments on various learning tasks including video semantic segmentation, human pose estimation, object detection, and action recognition, and demonstrate that we can achieve above 95% adversarial frame detection rate. To consider adaptive attackers, we show that even if an adversary has access to the detector and performs a strong adaptive attack based on the state of the art expectation of transformation method, the detection rate stays almost the same. We also tested the transferability among different optical flow estimators and show that it is hard for attackers to attack one and transfer the perturbation to others. In addition, as efficiency is important in video analysis, we show that advIT can achieve real-time detection in about 0.03 - 0.4 seconds.

Original languageEnglish (US)
Title of host publicationProceedings - 2019 International Conference on Computer Vision, ICCV 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages3967-3976
Number of pages10
ISBN (Electronic)9781728148038
DOIs
StatePublished - Oct 2019
Event17th IEEE/CVF International Conference on Computer Vision, ICCV 2019 - Seoul, Korea, Republic of
Duration: Oct 27 2019Nov 2 2019

Publication series

NameProceedings of the IEEE International Conference on Computer Vision
Volume2019-October
ISSN (Print)1550-5499

Conference

Conference17th IEEE/CVF International Conference on Computer Vision, ICCV 2019
CountryKorea, Republic of
CitySeoul
Period10/27/1911/2/19

ASJC Scopus subject areas

  • Software
  • Computer Vision and Pattern Recognition

Fingerprint Dive into the research topics of 'AdvIT: Adversarial frames identifier based on temporal consistency in videos'. Together they form a unique fingerprint.

  • Cite this

    Xiao, C., Deng, R., Li, B., Lee, T., Edwards, B., Yi, J., Song, D., Liu, M., & Molloy, I. (2019). AdvIT: Adversarial frames identifier based on temporal consistency in videos. In Proceedings - 2019 International Conference on Computer Vision, ICCV 2019 (pp. 3967-3976). [9010733] (Proceedings of the IEEE International Conference on Computer Vision; Vol. 2019-October). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICCV.2019.00407