Comparing unsupervised learning approaches to detect network intrusion using NetFlow data

Julina Zhang, Kerry Jones, Tianye Song, Hyojung Kang, Donald E. Brown

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Networks are vulnerable to costly attacks. Thus, the ability to detect these intrusions early on and minimize their impact is imperative to the financial security and reputation of an institution. There are two mainstream systems of intrusion detection (IDS), signature-based and anomaly-based IDS. Signature-based IDS identify intrusions by referencing a database of known identity, or signature, for each of the previous intrusion events. Anomaly-based IDS attempt to identify intrusions by referencing a baseline or learned patterns of normal behavior. Under this approach, deviations from the baseline are considered intrusions. We assume this type of behavior is rare and distinguishable from normal activity. Our research investigates unsupervised techniques for anomaly-based network intrusion detection. For this research, we use real-time traffic data from University of Virginia network. We evaluate the performance between Local Outlier Factor (LOF) and Isolation Forest (iForest) by probing the similarities and differences between the result of each approach. Distribution plots show there is a greater variation of attributes in anomalies identified by iForest than those anomalies identified by LOF. Furthermore, iForest results are more distinctive from all data than the LOF results. With the assumptions that anomalies are points that are rare and distinctive, we find that iForest performs well in identifying anomalies compared to LOF.

Original languageEnglish (US)
Title of host publication2017 Systems and Information Engineering Design Symposium, SIEDS 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages122-127
Number of pages6
ISBN (Electronic)9781538618486
DOIs
StatePublished - May 31 2017
Event2017 Systems and Information Engineering Design Symposium, SIEDS 2017 - Charlottesville, United States
Duration: Apr 28 2017 → …

Publication series

Name2017 Systems and Information Engineering Design Symposium, SIEDS 2017

Conference

Conference2017 Systems and Information Engineering Design Symposium, SIEDS 2017
CountryUnited States
CityCharlottesville
Period4/28/17 → …

Fingerprint

Unsupervised learning
Intrusion detection
Anomaly
Factors
Outliers
Isolation
Intrusion detection system

Keywords

  • Anomaly Detection
  • Machine Learning
  • Network Security
  • Unsupervised Learning

ASJC Scopus subject areas

  • Hardware and Architecture
  • Information Systems and Management
  • Computer Science Applications
  • Information Systems
  • Control and Systems Engineering
  • Decision Sciences (miscellaneous)

Cite this

Zhang, J., Jones, K., Song, T., Kang, H., & Brown, D. E. (2017). Comparing unsupervised learning approaches to detect network intrusion using NetFlow data. In 2017 Systems and Information Engineering Design Symposium, SIEDS 2017 (pp. 122-127). [7937701] (2017 Systems and Information Engineering Design Symposium, SIEDS 2017). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/SIEDS.2017.7937701

Comparing unsupervised learning approaches to detect network intrusion using NetFlow data. / Zhang, Julina; Jones, Kerry; Song, Tianye; Kang, Hyojung; Brown, Donald E.

2017 Systems and Information Engineering Design Symposium, SIEDS 2017. Institute of Electrical and Electronics Engineers Inc., 2017. p. 122-127 7937701 (2017 Systems and Information Engineering Design Symposium, SIEDS 2017).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Zhang, J, Jones, K, Song, T, Kang, H & Brown, DE 2017, Comparing unsupervised learning approaches to detect network intrusion using NetFlow data. in 2017 Systems and Information Engineering Design Symposium, SIEDS 2017., 7937701, 2017 Systems and Information Engineering Design Symposium, SIEDS 2017, Institute of Electrical and Electronics Engineers Inc., pp. 122-127, 2017 Systems and Information Engineering Design Symposium, SIEDS 2017, Charlottesville, United States, 4/28/17. https://doi.org/10.1109/SIEDS.2017.7937701
Zhang J, Jones K, Song T, Kang H, Brown DE. Comparing unsupervised learning approaches to detect network intrusion using NetFlow data. In 2017 Systems and Information Engineering Design Symposium, SIEDS 2017. Institute of Electrical and Electronics Engineers Inc. 2017. p. 122-127. 7937701. (2017 Systems and Information Engineering Design Symposium, SIEDS 2017). https://doi.org/10.1109/SIEDS.2017.7937701
Zhang, Julina ; Jones, Kerry ; Song, Tianye ; Kang, Hyojung ; Brown, Donald E. / Comparing unsupervised learning approaches to detect network intrusion using NetFlow data. 2017 Systems and Information Engineering Design Symposium, SIEDS 2017. Institute of Electrical and Electronics Engineers Inc., 2017. pp. 122-127 (2017 Systems and Information Engineering Design Symposium, SIEDS 2017).
@inproceedings{981b8697107846f593334e0a113f6ee1,
title = "Comparing unsupervised learning approaches to detect network intrusion using NetFlow data",
abstract = "Networks are vulnerable to costly attacks. Thus, the ability to detect these intrusions early on and minimize their impact is imperative to the financial security and reputation of an institution. There are two mainstream systems of intrusion detection (IDS), signature-based and anomaly-based IDS. Signature-based IDS identify intrusions by referencing a database of known identity, or signature, for each of the previous intrusion events. Anomaly-based IDS attempt to identify intrusions by referencing a baseline or learned patterns of normal behavior. Under this approach, deviations from the baseline are considered intrusions. We assume this type of behavior is rare and distinguishable from normal activity. Our research investigates unsupervised techniques for anomaly-based network intrusion detection. For this research, we use real-time traffic data from University of Virginia network. We evaluate the performance between Local Outlier Factor (LOF) and Isolation Forest (iForest) by probing the similarities and differences between the result of each approach. Distribution plots show there is a greater variation of attributes in anomalies identified by iForest than those anomalies identified by LOF. Furthermore, iForest results are more distinctive from all data than the LOF results. With the assumptions that anomalies are points that are rare and distinctive, we find that iForest performs well in identifying anomalies compared to LOF.",
keywords = "Anomaly Detection, Machine Learning, Network Security, Unsupervised Learning",
author = "Julina Zhang and Kerry Jones and Tianye Song and Hyojung Kang and Brown, {Donald E.}",
year = "2017",
month = "5",
day = "31",
doi = "10.1109/SIEDS.2017.7937701",
language = "English (US)",
series = "2017 Systems and Information Engineering Design Symposium, SIEDS 2017",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "122--127",
booktitle = "2017 Systems and Information Engineering Design Symposium, SIEDS 2017",
address = "United States",

}

TY - GEN

T1 - Comparing unsupervised learning approaches to detect network intrusion using NetFlow data

AU - Zhang, Julina

AU - Jones, Kerry

AU - Song, Tianye

AU - Kang, Hyojung

AU - Brown, Donald E.

PY - 2017/5/31

Y1 - 2017/5/31

N2 - Networks are vulnerable to costly attacks. Thus, the ability to detect these intrusions early on and minimize their impact is imperative to the financial security and reputation of an institution. There are two mainstream systems of intrusion detection (IDS), signature-based and anomaly-based IDS. Signature-based IDS identify intrusions by referencing a database of known identity, or signature, for each of the previous intrusion events. Anomaly-based IDS attempt to identify intrusions by referencing a baseline or learned patterns of normal behavior. Under this approach, deviations from the baseline are considered intrusions. We assume this type of behavior is rare and distinguishable from normal activity. Our research investigates unsupervised techniques for anomaly-based network intrusion detection. For this research, we use real-time traffic data from University of Virginia network. We evaluate the performance between Local Outlier Factor (LOF) and Isolation Forest (iForest) by probing the similarities and differences between the result of each approach. Distribution plots show there is a greater variation of attributes in anomalies identified by iForest than those anomalies identified by LOF. Furthermore, iForest results are more distinctive from all data than the LOF results. With the assumptions that anomalies are points that are rare and distinctive, we find that iForest performs well in identifying anomalies compared to LOF.

AB - Networks are vulnerable to costly attacks. Thus, the ability to detect these intrusions early on and minimize their impact is imperative to the financial security and reputation of an institution. There are two mainstream systems of intrusion detection (IDS), signature-based and anomaly-based IDS. Signature-based IDS identify intrusions by referencing a database of known identity, or signature, for each of the previous intrusion events. Anomaly-based IDS attempt to identify intrusions by referencing a baseline or learned patterns of normal behavior. Under this approach, deviations from the baseline are considered intrusions. We assume this type of behavior is rare and distinguishable from normal activity. Our research investigates unsupervised techniques for anomaly-based network intrusion detection. For this research, we use real-time traffic data from University of Virginia network. We evaluate the performance between Local Outlier Factor (LOF) and Isolation Forest (iForest) by probing the similarities and differences between the result of each approach. Distribution plots show there is a greater variation of attributes in anomalies identified by iForest than those anomalies identified by LOF. Furthermore, iForest results are more distinctive from all data than the LOF results. With the assumptions that anomalies are points that are rare and distinctive, we find that iForest performs well in identifying anomalies compared to LOF.

KW - Anomaly Detection

KW - Machine Learning

KW - Network Security

KW - Unsupervised Learning

UR - http://www.scopus.com/inward/record.url?scp=85025692141&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85025692141&partnerID=8YFLogxK

U2 - 10.1109/SIEDS.2017.7937701

DO - 10.1109/SIEDS.2017.7937701

M3 - Conference contribution

AN - SCOPUS:85025692141

T3 - 2017 Systems and Information Engineering Design Symposium, SIEDS 2017

SP - 122

EP - 127

BT - 2017 Systems and Information Engineering Design Symposium, SIEDS 2017

PB - Institute of Electrical and Electronics Engineers Inc.

ER -