POSTER: Game of Trojans: Adaptive Adversaries Against Output-based Trojaned-Model Detectors

Dinuka Sahabandu, Xiaojun Xu, Arezoo Rajabi, Luyao Niu, Bhaskar Ramasubramanian, Bo Li, Radha Poovendran

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Deep Neural Network (DNN) models are vulnerable to Trojan attacks, wherein a Trojaned DNN will mispredict trigger-embedded inputs as malicious targets, while outputs for clean inputs remain unaffected. Output-based Trojaned model detectors, which analyze outputs of DNNs to perturbed inputs have emerged as a promising approach for identifying Trojaned DNN models. At present, these SOTA detectors assume that the adversary is (i) static and (ii) does not have prior knowledge about deployed detection mechanisms. In this work in progress, we present an adaptive adversary that can retrain a Trojaned DNN and is also aware of output-based Trojaned model detectors. Such an adversary can ensure (1) high accuracy on both trigger-embedded and clean samples and (2) bypass detection. Our approach uses an observation that the high dimensionality of DNN parameters provides sufficient degrees of freedom to achieve these objectives. We also enable SOTA detectors to be adaptive by allowing retraining to recalibrate their parameters, thus modeling a co-evolution of parameters of a Trojaned model and detectors. We then show that this co-evolution can be modeled as an iterative game, and prove that the solution of this interactive game leads to the adversary successfully achieving the above objectives. We also show that for cross-entropy or log-likelihood loss functions used by the DNNs, a greedy algorithm provides provable guarantees on the needed number of trigger-embedded samples.

Original languageEnglish (US)
Title of host publicationACM AsiaCCS 2024 - Proceedings of the 19th ACM Asia Conference on Computer and Communications Security
PublisherAssociation for Computing Machinery
Pages1940-1942
Number of pages3
ISBN (Electronic)9798400704826
DOIs
StatePublished - Jul 1 2024
Event19th ACM Asia Conference on Computer and Communications Security, AsiaCCS 2024 - Singapore, Singapore
Duration: Jul 1 2024Jul 5 2024

Publication series

NameACM AsiaCCS 2024 - Proceedings of the 19th ACM Asia Conference on Computer and Communications Security

Conference

Conference19th ACM Asia Conference on Computer and Communications Security, AsiaCCS 2024
Country/TerritorySingapore
CitySingapore
Period7/1/247/5/24

Keywords

  • adversary-detector co-evolution
  • Trojan attack

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Networks and Communications
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'POSTER: Game of Trojans: Adaptive Adversaries Against Output-based Trojaned-Model Detectors'. Together they form a unique fingerprint.

Cite this