Abstract
As machine learning systems grow in scale, so do their training data requirements, forcing practitioners to automate and outsource the curation of training data in order to achieve state-of-the-art performance. The absence of trustworthy human supervision over the data collection process exposes organizations to security vulnerabilities; training data can be manipulated to control and degrade the downstream behaviors of learned models. The goal of this work is to systematically categorize and discuss a wide range of dataset vulnerabilities and exploits, approaches for defending against these threats, and an array of open problems in this space.
| Original language | English (US) |
|---|---|
| Pages (from-to) | 1563-1580 |
| Number of pages | 18 |
| Journal | IEEE transactions on pattern analysis and machine intelligence |
| Volume | 45 |
| Issue number | 2 |
| DOIs | |
| State | Published - Feb 1 2023 |
Keywords
- Data poisoning
- backdoor attacks
- dataset security
ASJC Scopus subject areas
- Software
- Computer Vision and Pattern Recognition
- Computational Theory and Mathematics
- Applied Mathematics
- Artificial Intelligence