Abstract
As machine learning systems grow in scale, so do their training data requirements, forcing practitioners to automate and outsource the curation of training data in order to achieve state-of-the-art performance. The absence of trustworthy human supervision over the data collection process exposes organizations to security vulnerabilities; training data can be manipulated to control and degrade the downstream behaviors of learned models. The goal of this work is to systematically categorize and discuss a wide range of dataset vulnerabilities and exploits, approaches for defending against these threats, and an array of open problems in this space.
Original language | English (US) |
---|---|
Pages (from-to) | 1563-1580 |
Number of pages | 18 |
Journal | IEEE transactions on pattern analysis and machine intelligence |
Volume | 45 |
Issue number | 2 |
DOIs | |
State | Published - Feb 1 2023 |
Keywords
- Data poisoning
- backdoor attacks
- dataset security
ASJC Scopus subject areas
- Software
- Computer Vision and Pattern Recognition
- Computational Theory and Mathematics
- Artificial Intelligence
- Applied Mathematics