Dataset Security for Machine Learning: Data Poisoning, Backdoor Attacks, and Defenses

Micah Goldblum, Dimitris Tsipras, Chulin Xie, Xinyun Chen, Avi Schwarzschild, Dawn Song, Aleksander Madry, Bo Li, Tom Goldstein

Research output: Contribution to journalArticlepeer-review


As machine learning systems grow in scale, so do their training data requirements, forcing practitioners to automate and outsource the curation of training data in order to achieve state-of-the-art performance. The absence of trustworthy human supervision over the data collection process exposes organizations to security vulnerabilities; training data can be manipulated to control and degrade the downstream behaviors of learned models. The goal of this work is to systematically categorize and discuss a wide range of dataset vulnerabilities and exploits, approaches for defending against these threats, and an array of open problems in this space.

Original languageEnglish (US)
Pages (from-to)1563-1580
Number of pages18
JournalIEEE transactions on pattern analysis and machine intelligence
Issue number2
StatePublished - Feb 1 2023


  • Data poisoning
  • backdoor attacks
  • dataset security

ASJC Scopus subject areas

  • Software
  • Artificial Intelligence
  • Applied Mathematics
  • Computer Vision and Pattern Recognition
  • Computational Theory and Mathematics


Dive into the research topics of 'Dataset Security for Machine Learning: Data Poisoning, Backdoor Attacks, and Defenses'. Together they form a unique fingerprint.

Cite this