Large-scale identification of malicious singleton files

Bo Li, Kevin Roundy, Chris Gates, Yevgeniy Vorobeychik

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We study a dataset of billions of program binary files that appeared on 100 million computers over the course of 12 months, discovering that 94% of these files were present on a single machine. Though malware polymorphism is one cause for the large number of singleton files, additional factors also contribute to polymorphism, given that the ratio of benign to malicious singleton files is 80:1. The huge number of be- nign singletons makes it challenging to reliably identify the minority of malicious singletons. We present a large-scale study of the properties, characteristics, and distribution of benign and malicious singleton files. We leverage the insights from this study to build a classifier based purely on static features to identify 92% of the remaining malicious singletons at a 1:4% percent false positive rate, despite heavy use of obfuscation and packing techniques by most malicious singleton files that we make no attempt to de-obfuscate. Finally, we demonstrate robustness of our classifier to impor-Tant classes of automated evasion attacks.

Original languageEnglish (US)
Title of host publicationCODASPY 2017 - Proceedings of the 7th ACM Conference on Data and Application Security and Privacy
PublisherAssociation for Computing Machinery, Inc
Pages227-238
Number of pages12
ISBN (Electronic)9781450345231
DOIs
StatePublished - Mar 22 2017
Externally publishedYes
Event7th ACM Conference on Data and Application Security and Privacy, CODASPY 2017 - Scottsdale, United States
Duration: Mar 22 2017Mar 24 2017

Publication series

NameCODASPY 2017 - Proceedings of the 7th ACM Conference on Data and Application Security and Privacy

Conference

Conference7th ACM Conference on Data and Application Security and Privacy, CODASPY 2017
Country/TerritoryUnited States
CityScottsdale
Period3/22/173/24/17

Keywords

  • Malware detection
  • Robust classifier
  • Singleton files

ASJC Scopus subject areas

  • Computer Science Applications
  • Information Systems
  • Software

Fingerprint

Dive into the research topics of 'Large-scale identification of malicious singleton files'. Together they form a unique fingerprint.

Cite this