TY - GEN
T1 - DrSec
T2 - 45th IEEE Symposium on Security and Privacy, SP 2024
AU - Sharif, Mahmood
AU - Datta, Pubali
AU - Riddle, Andy
AU - Westfall, Kim
AU - Bates, Adam
AU - Ganti, Vijay
AU - Lentzk, Matthew
AU - Ott, David
N1 - We would like to thank Raghav Batta, Christophe Briguet, Lalit Jain, and Ivan Yang for helpful discussions and technical help. This work was supported in part by NSF grant CNS-2055127.
PY - 2024
Y1 - 2024
N2 - The increasing complexity of attacks has given rise to varied security applications tackling profound tasks, ranging from alert triage to attack reconstruction. Yet, security products, such as Endpoint Detection and Response, bring together applications that are developed in isolation, trigger many false positives, miss actual attacks, and produce limited labels useful in supervised learning schemes. To address these challenges, we propose DrSec - a system employing self-supervised learning to pre-train foundation language models (LMs) that ingest event-sequence data and emit distributed representations for processes. Once pre-trained, the LMs can be adapted to solve different downstream tasks with limited to no supervision, helping unify the currently fractured application ecosystem. We trained DrSec with two LM types on a real-world dataset containing ∼91M processes and ∼2.55B events, and tested it in three application domains. We found that DrSec enables accurate, unsupervised process identification; outperforms leading methods on alert triage to reduce alert fatigue (e.g., 75.11% vs. ≤64.31% precision-recall area under curve); and accurately learns expert-developed rules, allowing tuning incident detectors to control false positives and negatives.
AB - The increasing complexity of attacks has given rise to varied security applications tackling profound tasks, ranging from alert triage to attack reconstruction. Yet, security products, such as Endpoint Detection and Response, bring together applications that are developed in isolation, trigger many false positives, miss actual attacks, and produce limited labels useful in supervised learning schemes. To address these challenges, we propose DrSec - a system employing self-supervised learning to pre-train foundation language models (LMs) that ingest event-sequence data and emit distributed representations for processes. Once pre-trained, the LMs can be adapted to solve different downstream tasks with limited to no supervision, helping unify the currently fractured application ecosystem. We trained DrSec with two LM types on a real-world dataset containing ∼91M processes and ∼2.55B events, and tested it in three application domains. We found that DrSec enables accurate, unsupervised process identification; outperforms leading methods on alert triage to reduce alert fatigue (e.g., 75.11% vs. ≤64.31% precision-recall area under curve); and accurately learns expert-developed rules, allowing tuning incident detectors to control false positives and negatives.
KW - alert triage
KW - EDR
KW - Endpoint security
KW - language models
KW - process identification
KW - self-supervision
UR - http://www.scopus.com/inward/record.url?scp=85204032111&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85204032111&partnerID=8YFLogxK
U2 - 10.1109/SP54263.2024.00145
DO - 10.1109/SP54263.2024.00145
M3 - Conference contribution
AN - SCOPUS:85204032111
T3 - Proceedings - IEEE Symposium on Security and Privacy
SP - 3609
EP - 3624
BT - Proceedings - 45th IEEE Symposium on Security and Privacy, SP 2024
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 20 May 2024 through 23 May 2024
ER -