The Trojan Detection Challenge

Mantas Mazeika, Dan Hendrycks, Huichen Li, Xiaojun Xu, Sidney Hough, Andy Zou, Arezoo Rajabi, Qi Yao, Zihao Wang, Jian Tian, Yao Tang, Di Tang, Roman Smirnov, Pavel Pleskov, Nikita Benkovich, Dawn Song, Radha Poovendran, Bo Li, David Forsyth

Research output: Contribution to journalConference articlepeer-review

Abstract

Neural trojan attacks inject machine learning systems with hidden behavior that lies dormant until activated. In recent years, trojan detection has emerged as a promising avenue for defending against standard trojan attacks. However, there have been few investigations on trojans specifically designed to be difficult to detect. We organized the Trojan Detection Challenge to begin work on the important question of how to build more robust trojan detectors. This paper gives an overview of the competition and its results. Notably, participants greatly improved over strong baselines on trojan detection and reverse-engineering tasks, demonstrating the potential for proactively improving the robustness of trojan detectors. We hope the competition and its results will inspire further research in detecting hidden behavior in machine learning systems.

Original languageEnglish (US)
Pages (from-to)279-291
Number of pages13
JournalProceedings of Machine Learning Research
Volume220
StatePublished - 2023
Event36th Annual Conference on Neural Information Processing Systems - Competition Track, NeurIPS 2022 - Virtual, Online, United States
Duration: Nov 28 2022Dec 9 2022

Keywords

  • ML Safety
  • hidden behavior
  • monitoring
  • security
  • trojan detection
  • trojans

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Statistics and Probability

Fingerprint

Dive into the research topics of 'The Trojan Detection Challenge'. Together they form a unique fingerprint.

Cite this