Attacking visual language grounding with adversarial examples: A case study on neural image captioning

Hongge Chen, Huan Zhang, Pin Yu Chen, Jinfeng Yi, Cho Jui Hsieh

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Visual language grounding is widely studied in modern neural image captioning systems, which typically adopts an encoder-decoder framework consisting of two principal components: a convolutional neural network (CNN) for image feature extraction and a recurrent neural network (RNN) for language caption generation. To study the robustness of language grounding to adversarial perturbations in machine vision and perception, we propose Show-and-Fool, a novel algorithm for crafting adversarial examples in neural image captioning. The proposed algorithm provides two evaluation approaches, which check whether neural image captioning systems can be mislead to output some randomly chosen captions or keywords. Our extensive experiments show that our algorithm can successfully craft visually-similar adversarial examples with randomly targeted captions or keywords, and the adversarial examples can be made highly transferable to other image captioning systems. Consequently, our approach leads to new robustness implications of neural image captioning and novel insights in visual language grounding.

Original languageEnglish (US)
Title of host publicationACL 2018 - 56th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers)
PublisherAssociation for Computational Linguistics (ACL)
Pages2587-2597
Number of pages11
ISBN (Electronic)9781948087322
DOIs
StatePublished - 2018
Externally publishedYes
Event56th Annual Meeting of the Association for Computational Linguistics, ACL 2018 - Melbourne, Australia
Duration: Jul 15 2018Jul 20 2018

Publication series

NameACL 2018 - 56th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers)
Volume1

Conference

Conference56th Annual Meeting of the Association for Computational Linguistics, ACL 2018
Country/TerritoryAustralia
CityMelbourne
Period7/15/187/20/18

ASJC Scopus subject areas

  • Software
  • Computational Theory and Mathematics

Fingerprint

Dive into the research topics of 'Attacking visual language grounding with adversarial examples: A case study on neural image captioning'. Together they form a unique fingerprint.

Cite this