Tiresias: Online anomaly detection for hierarchical operational network data

Chi Yao Hong, Matthew Caesar, Nick Duffield, Jia Wang

Research output: Contribution to conferencePaper

Abstract

Operational network data, management data such as customer care call logs and equipment system logs, is a very important source of information for network operators to detect problems in their networks. Unfortunately, there is lack of efficient tools to automatically track and detect anomalous events on operational data, causing ISP operators to rely on manual inspection of this data. While anomaly detection has been widely studied in the context of network data, operational data presents several new challenges, including the volatility and sparseness of data, and the need to perform fast detection (complicating application of schemes that require offline processing or large/stable data sets to converge). To address these challenges, we propose Tiresias, an automated approach to locating anomalous events on hierarchical operational data. Tiresias leverages the hierarchical structure of operational data to identify high-impact aggregates (e.g., locations in the network, failure modes) likely to be associated with anomalous events. To accommodate different kinds of operational network data, Tiresias consists of an online detection algorithm with low time and space complexity, while preserving high detection accuracy. We present results from two case studies using operational data collected at a large commercial IP network operated by a Tier-1 ISP: customer care call logs and set-top box crash logs. By comparing with a reference set verified by the ISP's operational group, we validate that Tiresias can achieve >94% accuracy in locating anomalies. Tiresias also discovered several previously unknown anomalies in the ISP's customer care cases, demonstrating its effectiveness.

Original languageEnglish (US)
Pages173-182
Number of pages10
DOIs
StatePublished - Oct 5 2012
Event32nd IEEE International Conference on Distributed Computing Systems, ICDCS 2012 - Macau, China
Duration: Jun 18 2012Jun 21 2012

Other

Other32nd IEEE International Conference on Distributed Computing Systems, ICDCS 2012
CountryChina
CityMacau
Period6/18/126/21/12

Fingerprint

Set-top boxes
Information management
Failure modes
Inspection
Processing

Keywords

  • Anomaly detection
  • Log analysis
  • Operational network data
  • Time series analysis

ASJC Scopus subject areas

  • Software
  • Hardware and Architecture
  • Computer Networks and Communications

Cite this

Hong, C. Y., Caesar, M., Duffield, N., & Wang, J. (2012). Tiresias: Online anomaly detection for hierarchical operational network data. 173-182. Paper presented at 32nd IEEE International Conference on Distributed Computing Systems, ICDCS 2012, Macau, China. https://doi.org/10.1109/ICDCS.2012.30

Tiresias : Online anomaly detection for hierarchical operational network data. / Hong, Chi Yao; Caesar, Matthew; Duffield, Nick; Wang, Jia.

2012. 173-182 Paper presented at 32nd IEEE International Conference on Distributed Computing Systems, ICDCS 2012, Macau, China.

Research output: Contribution to conferencePaper

Hong, CY, Caesar, M, Duffield, N & Wang, J 2012, 'Tiresias: Online anomaly detection for hierarchical operational network data', Paper presented at 32nd IEEE International Conference on Distributed Computing Systems, ICDCS 2012, Macau, China, 6/18/12 - 6/21/12 pp. 173-182. https://doi.org/10.1109/ICDCS.2012.30
Hong CY, Caesar M, Duffield N, Wang J. Tiresias: Online anomaly detection for hierarchical operational network data. 2012. Paper presented at 32nd IEEE International Conference on Distributed Computing Systems, ICDCS 2012, Macau, China. https://doi.org/10.1109/ICDCS.2012.30
Hong, Chi Yao ; Caesar, Matthew ; Duffield, Nick ; Wang, Jia. / Tiresias : Online anomaly detection for hierarchical operational network data. Paper presented at 32nd IEEE International Conference on Distributed Computing Systems, ICDCS 2012, Macau, China.10 p.
@conference{9208642828064c8098bdd069dd8b7694,
title = "Tiresias: Online anomaly detection for hierarchical operational network data",
abstract = "Operational network data, management data such as customer care call logs and equipment system logs, is a very important source of information for network operators to detect problems in their networks. Unfortunately, there is lack of efficient tools to automatically track and detect anomalous events on operational data, causing ISP operators to rely on manual inspection of this data. While anomaly detection has been widely studied in the context of network data, operational data presents several new challenges, including the volatility and sparseness of data, and the need to perform fast detection (complicating application of schemes that require offline processing or large/stable data sets to converge). To address these challenges, we propose Tiresias, an automated approach to locating anomalous events on hierarchical operational data. Tiresias leverages the hierarchical structure of operational data to identify high-impact aggregates (e.g., locations in the network, failure modes) likely to be associated with anomalous events. To accommodate different kinds of operational network data, Tiresias consists of an online detection algorithm with low time and space complexity, while preserving high detection accuracy. We present results from two case studies using operational data collected at a large commercial IP network operated by a Tier-1 ISP: customer care call logs and set-top box crash logs. By comparing with a reference set verified by the ISP's operational group, we validate that Tiresias can achieve >94{\%} accuracy in locating anomalies. Tiresias also discovered several previously unknown anomalies in the ISP's customer care cases, demonstrating its effectiveness.",
keywords = "Anomaly detection, Log analysis, Operational network data, Time series analysis",
author = "Hong, {Chi Yao} and Matthew Caesar and Nick Duffield and Jia Wang",
year = "2012",
month = "10",
day = "5",
doi = "10.1109/ICDCS.2012.30",
language = "English (US)",
pages = "173--182",
note = "32nd IEEE International Conference on Distributed Computing Systems, ICDCS 2012 ; Conference date: 18-06-2012 Through 21-06-2012",

}

TY - CONF

T1 - Tiresias

T2 - Online anomaly detection for hierarchical operational network data

AU - Hong, Chi Yao

AU - Caesar, Matthew

AU - Duffield, Nick

AU - Wang, Jia

PY - 2012/10/5

Y1 - 2012/10/5

N2 - Operational network data, management data such as customer care call logs and equipment system logs, is a very important source of information for network operators to detect problems in their networks. Unfortunately, there is lack of efficient tools to automatically track and detect anomalous events on operational data, causing ISP operators to rely on manual inspection of this data. While anomaly detection has been widely studied in the context of network data, operational data presents several new challenges, including the volatility and sparseness of data, and the need to perform fast detection (complicating application of schemes that require offline processing or large/stable data sets to converge). To address these challenges, we propose Tiresias, an automated approach to locating anomalous events on hierarchical operational data. Tiresias leverages the hierarchical structure of operational data to identify high-impact aggregates (e.g., locations in the network, failure modes) likely to be associated with anomalous events. To accommodate different kinds of operational network data, Tiresias consists of an online detection algorithm with low time and space complexity, while preserving high detection accuracy. We present results from two case studies using operational data collected at a large commercial IP network operated by a Tier-1 ISP: customer care call logs and set-top box crash logs. By comparing with a reference set verified by the ISP's operational group, we validate that Tiresias can achieve >94% accuracy in locating anomalies. Tiresias also discovered several previously unknown anomalies in the ISP's customer care cases, demonstrating its effectiveness.

AB - Operational network data, management data such as customer care call logs and equipment system logs, is a very important source of information for network operators to detect problems in their networks. Unfortunately, there is lack of efficient tools to automatically track and detect anomalous events on operational data, causing ISP operators to rely on manual inspection of this data. While anomaly detection has been widely studied in the context of network data, operational data presents several new challenges, including the volatility and sparseness of data, and the need to perform fast detection (complicating application of schemes that require offline processing or large/stable data sets to converge). To address these challenges, we propose Tiresias, an automated approach to locating anomalous events on hierarchical operational data. Tiresias leverages the hierarchical structure of operational data to identify high-impact aggregates (e.g., locations in the network, failure modes) likely to be associated with anomalous events. To accommodate different kinds of operational network data, Tiresias consists of an online detection algorithm with low time and space complexity, while preserving high detection accuracy. We present results from two case studies using operational data collected at a large commercial IP network operated by a Tier-1 ISP: customer care call logs and set-top box crash logs. By comparing with a reference set verified by the ISP's operational group, we validate that Tiresias can achieve >94% accuracy in locating anomalies. Tiresias also discovered several previously unknown anomalies in the ISP's customer care cases, demonstrating its effectiveness.

KW - Anomaly detection

KW - Log analysis

KW - Operational network data

KW - Time series analysis

UR - http://www.scopus.com/inward/record.url?scp=84866900487&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84866900487&partnerID=8YFLogxK

U2 - 10.1109/ICDCS.2012.30

DO - 10.1109/ICDCS.2012.30

M3 - Paper

AN - SCOPUS:84866900487

SP - 173

EP - 182

ER -