Automated Generation and Selection of Interpretable Features for Enterprise Security

Jiayi Duan, Ziheng Zeng, Alina Oprea, Shobha Vasudevan

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We present an effective machine learning method for malicious activity detection in enterprise security logs. Our method involves feature engineering, or generating new features by applying operators on features of the raw data. We generate DNF formulas from raw features, extract Boolean functions from them, and leverage Fourier analysis to generate new parity features and rank them based on their highest Fourier coefficients. We demonstrate on real enterprise data sets that the engineered features enhance the performance of a wide range of classifiers and clustering algorithms. As compared to classification of raw data features, the engineered features achieve up to 50.6% improvement in malicious recall, while sacrificing no more than 0.47% in accuracy. We also observe better isolation of malicious clusters, when performing clustering on engineered features. In general, a small number of engineered features achieve higher performance than raw data features according to our metrics of interest. Our feature engineering method also retains interpretability, an important consideration in cyber security applications.

Original languageEnglish (US)
Title of host publicationProceedings - 2018 IEEE International Conference on Big Data, Big Data 2018
EditorsYang Song, Bing Liu, Kisung Lee, Naoki Abe, Calton Pu, Mu Qiao, Nesreen Ahmed, Donald Kossmann, Jeffrey Saltz, Jiliang Tang, Jingrui He, Huan Liu, Xiaohua Hu
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1258-1265
Number of pages8
ISBN (Electronic)9781538650356
DOIs
StatePublished - Jan 22 2019
Event2018 IEEE International Conference on Big Data, Big Data 2018 - Seattle, United States
Duration: Dec 10 2018Dec 13 2018

Publication series

NameProceedings - 2018 IEEE International Conference on Big Data, Big Data 2018

Conference

Conference2018 IEEE International Conference on Big Data, Big Data 2018
CountryUnited States
CitySeattle
Period12/10/1812/13/18

ASJC Scopus subject areas

  • Computer Science Applications
  • Information Systems

Fingerprint Dive into the research topics of 'Automated Generation and Selection of Interpretable Features for Enterprise Security'. Together they form a unique fingerprint.

  • Cite this

    Duan, J., Zeng, Z., Oprea, A., & Vasudevan, S. (2019). Automated Generation and Selection of Interpretable Features for Enterprise Security. In Y. Song, B. Liu, K. Lee, N. Abe, C. Pu, M. Qiao, N. Ahmed, D. Kossmann, J. Saltz, J. Tang, J. He, H. Liu, & X. Hu (Eds.), Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018 (pp. 1258-1265). [8621986] (Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/BigData.2018.8621986