Abstract
Neural trojan attacks inject machine learning systems with hidden behavior that lies dormant until activated. In recent years, trojan detection has emerged as a promising avenue for defending against standard trojan attacks. However, there have been few investigations on trojans specifically designed to be difficult to detect. We organized the Trojan Detection Challenge to begin work on the important question of how to build more robust trojan detectors. This paper gives an overview of the competition and its results. Notably, participants greatly improved over strong baselines on trojan detection and reverse-engineering tasks, demonstrating the potential for proactively improving the robustness of trojan detectors. We hope the competition and its results will inspire further research in detecting hidden behavior in machine learning systems.
Original language | English (US) |
---|---|
Pages (from-to) | 279-291 |
Number of pages | 13 |
Journal | Proceedings of Machine Learning Research |
Volume | 220 |
State | Published - 2023 |
Event | 36th Annual Conference on Neural Information Processing Systems - Competition Track, NeurIPS 2022 - Virtual, Online, United States Duration: Nov 28 2022 → Dec 9 2022 |
Keywords
- ML Safety
- hidden behavior
- monitoring
- security
- trojan detection
- trojans
ASJC Scopus subject areas
- Artificial Intelligence
- Software
- Control and Systems Engineering
- Statistics and Probability