TY - GEN
T1 - Seq2sick
T2 - 34th AAAI Conference on Artificial Intelligence, AAAI 2020
AU - Cheng, Minhao
AU - Yi, Jinfeng
AU - Chen, Pin Yu
AU - Zhang, Huan
AU - Hsieh, Cho Jui
N1 - Publisher Copyright:
© 2020, Association for the Advancement of Artificial Intelligence.
PY - 2020
Y1 - 2020
N2 - Crafting adversarial examples has become an important technique to evaluate the robustness of deep neural networks (DNNs). However, most existing works focus on attacking the image classification problem since its input space is continuous and output space is finite. In this paper, we study the much more challenging problem of crafting adversarial examples for sequence-to-sequence (seq2seq) models, whose inputs are discrete text strings and outputs have an almost infinite number of possibilities. To address the challenges caused by the discrete input space, we propose a projected gradient method combined with group lasso and gradient regularization. To handle the almost infinite output space, we design some novel loss functions to conduct non-overlapping attack and targeted keyword attack. We apply our algorithm to machine translation and text summarization tasks, and verify the effectiveness of the proposed algorithm: By changing less than 3 words, we can make seq2seq model to produce desired outputs with high success rates. We also use an external sentiment classifier to verify the property of preserving semantic meanings for our generated adversarial examples. On the other hand, we recognize that, compared with the wellevaluated CNN-based classifiers, seq2seq models are intrinsically more robust to adversarial attacks.
AB - Crafting adversarial examples has become an important technique to evaluate the robustness of deep neural networks (DNNs). However, most existing works focus on attacking the image classification problem since its input space is continuous and output space is finite. In this paper, we study the much more challenging problem of crafting adversarial examples for sequence-to-sequence (seq2seq) models, whose inputs are discrete text strings and outputs have an almost infinite number of possibilities. To address the challenges caused by the discrete input space, we propose a projected gradient method combined with group lasso and gradient regularization. To handle the almost infinite output space, we design some novel loss functions to conduct non-overlapping attack and targeted keyword attack. We apply our algorithm to machine translation and text summarization tasks, and verify the effectiveness of the proposed algorithm: By changing less than 3 words, we can make seq2seq model to produce desired outputs with high success rates. We also use an external sentiment classifier to verify the property of preserving semantic meanings for our generated adversarial examples. On the other hand, we recognize that, compared with the wellevaluated CNN-based classifiers, seq2seq models are intrinsically more robust to adversarial attacks.
UR - http://www.scopus.com/inward/record.url?scp=85095106476&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85095106476&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85095106476
T3 - AAAI 2020 - 34th AAAI Conference on Artificial Intelligence
SP - 3601
EP - 3608
BT - AAAI 2020 - 34th AAAI Conference on Artificial Intelligence
PB - American Association for Artificial Intelligence (AAAI) Press
Y2 - 7 February 2020 through 12 February 2020
ER -