TY - GEN
T1 - POSTER
T2 - 19th ACM Asia Conference on Computer and Communications Security, AsiaCCS 2024
AU - Sahabandu, Dinuka
AU - Xu, Xiaojun
AU - Rajabi, Arezoo
AU - Niu, Luyao
AU - Ramasubramanian, Bhaskar
AU - Li, Bo
AU - Poovendran, Radha
N1 - Publisher Copyright:
© 2024 Copyright held by the owner/author(s).
PY - 2024/7/1
Y1 - 2024/7/1
N2 - Deep Neural Network (DNN) models are vulnerable to Trojan attacks, wherein a Trojaned DNN will mispredict trigger-embedded inputs as malicious targets, while outputs for clean inputs remain unaffected. Output-based Trojaned model detectors, which analyze outputs of DNNs to perturbed inputs have emerged as a promising approach for identifying Trojaned DNN models. At present, these SOTA detectors assume that the adversary is (i) static and (ii) does not have prior knowledge about deployed detection mechanisms. In this work in progress, we present an adaptive adversary that can retrain a Trojaned DNN and is also aware of output-based Trojaned model detectors. Such an adversary can ensure (1) high accuracy on both trigger-embedded and clean samples and (2) bypass detection. Our approach uses an observation that the high dimensionality of DNN parameters provides sufficient degrees of freedom to achieve these objectives. We also enable SOTA detectors to be adaptive by allowing retraining to recalibrate their parameters, thus modeling a co-evolution of parameters of a Trojaned model and detectors. We then show that this co-evolution can be modeled as an iterative game, and prove that the solution of this interactive game leads to the adversary successfully achieving the above objectives. We also show that for cross-entropy or log-likelihood loss functions used by the DNNs, a greedy algorithm provides provable guarantees on the needed number of trigger-embedded samples.
AB - Deep Neural Network (DNN) models are vulnerable to Trojan attacks, wherein a Trojaned DNN will mispredict trigger-embedded inputs as malicious targets, while outputs for clean inputs remain unaffected. Output-based Trojaned model detectors, which analyze outputs of DNNs to perturbed inputs have emerged as a promising approach for identifying Trojaned DNN models. At present, these SOTA detectors assume that the adversary is (i) static and (ii) does not have prior knowledge about deployed detection mechanisms. In this work in progress, we present an adaptive adversary that can retrain a Trojaned DNN and is also aware of output-based Trojaned model detectors. Such an adversary can ensure (1) high accuracy on both trigger-embedded and clean samples and (2) bypass detection. Our approach uses an observation that the high dimensionality of DNN parameters provides sufficient degrees of freedom to achieve these objectives. We also enable SOTA detectors to be adaptive by allowing retraining to recalibrate their parameters, thus modeling a co-evolution of parameters of a Trojaned model and detectors. We then show that this co-evolution can be modeled as an iterative game, and prove that the solution of this interactive game leads to the adversary successfully achieving the above objectives. We also show that for cross-entropy or log-likelihood loss functions used by the DNNs, a greedy algorithm provides provable guarantees on the needed number of trigger-embedded samples.
KW - adversary-detector co-evolution
KW - Trojan attack
UR - http://www.scopus.com/inward/record.url?scp=85199266962&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85199266962&partnerID=8YFLogxK
U2 - 10.1145/3634737.3659430
DO - 10.1145/3634737.3659430
M3 - Conference contribution
AN - SCOPUS:85199266962
T3 - ACM AsiaCCS 2024 - Proceedings of the 19th ACM Asia Conference on Computer and Communications Security
SP - 1940
EP - 1942
BT - ACM AsiaCCS 2024 - Proceedings of the 19th ACM Asia Conference on Computer and Communications Security
PB - Association for Computing Machinery
Y2 - 1 July 2024 through 5 July 2024
ER -