TY - GEN
T1 - Mining approximate top-K subspace anomalies in multi-dimensional time-series data
AU - Li, Xiaolei
AU - Han, Jiawei
N1 - Funding Information:
The work was supported in part by the U.S. National Science Foundation NSF IIS-05-13678/06-42771 and NSF BDI-05-15813. Any opinions, findings, and conclusions or recommendations expressed here are those of the authors and do not necessarily reflect the views of the funding agencies.
Publisher Copyright:
Copyright 2007 VLDB Endowment, ACM.
PY - 2007
Y1 - 2007
N2 - Market analysis is a representative data analysis process with many applications. In such an analysis, critical numerical measures, such as profit and sales, fluctuate over time and form time-series data. Moreover, the time series data correspond to market segments, which are described by a set of attributes, such as age, gender, education, income level, and product-category, that form a multi-dimensional structure. To better understand market dynamics and predict future trends, it is crucial to study the dynamics of time-series in multi-dimensional market segments. This is a topic that has been largely ignored in time series and data cube research. In this study, we examine the issues of anomaly detection in multi-dimensional time-series data. We propose timeseries data cube to capture the multi-dimensional space formed by the attribute structure. This facilitates the detection of anomalies based on expected values derived from higher level, "more general" time-series. Anomaly detection in a time-series data cube poses computational challenges, especially for high-dimensional, large data sets. To this end, we also propose an efficient search algorithm to iteratively select subspaces in the original high-dimensional space and detect anomalies within each one. Our experiments with both synthetic and real-world data demonstrate the effectiveness and efficiency of the proposed solution.
AB - Market analysis is a representative data analysis process with many applications. In such an analysis, critical numerical measures, such as profit and sales, fluctuate over time and form time-series data. Moreover, the time series data correspond to market segments, which are described by a set of attributes, such as age, gender, education, income level, and product-category, that form a multi-dimensional structure. To better understand market dynamics and predict future trends, it is crucial to study the dynamics of time-series in multi-dimensional market segments. This is a topic that has been largely ignored in time series and data cube research. In this study, we examine the issues of anomaly detection in multi-dimensional time-series data. We propose timeseries data cube to capture the multi-dimensional space formed by the attribute structure. This facilitates the detection of anomalies based on expected values derived from higher level, "more general" time-series. Anomaly detection in a time-series data cube poses computational challenges, especially for high-dimensional, large data sets. To this end, we also propose an efficient search algorithm to iteratively select subspaces in the original high-dimensional space and detect anomalies within each one. Our experiments with both synthetic and real-world data demonstrate the effectiveness and efficiency of the proposed solution.
UR - http://www.scopus.com/inward/record.url?scp=85011049882&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85011049882&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85011049882
T3 - 33rd International Conference on Very Large Data Bases, VLDB 2007 - Conference Proceedings
SP - 447
EP - 458
BT - 33rd International Conference on Very Large Data Bases, VLDB 2007 - Conference Proceedings
A2 - Gehrke, Johannes
A2 - Koch, Christoph
A2 - Garofalakis, Minos
A2 - Aberer, Karl
A2 - Kanne, Carl-Christian
A2 - Neuhold, Erich J.
A2 - Ganti, Venkatesh
A2 - Klas, Wolfgang
A2 - Chan, Chee-Yong
A2 - Srivastava, Divesh
A2 - Florescu, Dana
A2 - Deshpande, Anand
PB - Association for Computing Machinery, Inc
T2 - 33rd International Conference on Very Large Data Bases, VLDB 2007
Y2 - 23 September 2007 through 27 September 2007
ER -