TY - JOUR
T1 - Sequential anomaly detection in the presence of noise and limited feedback
AU - Raginsky, Maxim
AU - Willett, Rebecca M.
AU - Horn, Corinne
AU - Silva, Jorge
AU - Marcia, Roummel F.
N1 - Funding Information:
Manuscript received July 14, 2010; revised February 05, 2012; accepted February 23, 2012. Date of publication May 30, 2012; date of current version July 10, 2012. This work was supported in part by the NSF CAREER Award CCF-06-43947, in part by the NSF Award DMS-08-11062, in part by the DARPA Grant HR0011-07-1-003, and in part by the ARO Grant W911NF-09-1-0262. The material in this paper was presented in part at the 2009 IEEE International Symposium on Information Theory.
PY - 2012
Y1 - 2012
N2 - This paper describes a methodology for detecting anomalies from sequentially observed and potentially noisy data. The proposed approach consists of two main elements: 1) filtering, or assigning a belief or likelihood to each successive measurement based upon our ability to predict it from previous noisy observations and 2) hedging, or flagging potential anomalies by comparing the current belief against a time-varying and data-adaptive threshold. The threshold is adjusted based on the available feedback from an end user. Our algorithms, which combine universal prediction with recent work on online convex programming, do not require computing posterior distributions given all current observations and involve simple primal-dual parameter updates. At the heart of the proposed approach lie exponential-family models which can be used in a wide variety of contexts and applications, and which yield methods that achieve sublinear per-round regret against both static and slowly varying product distributions with marginals drawn from the same exponential family. Moreover, the regret against static distributions coincides with the minimax value of the corresponding online strongly convex game. We also prove bounds on the number of mistakes made during the hedging step relative to the best offline choice of the threshold with access to all estimated beliefs and feedback signals. We validate the theory on synthetic data drawn from a time-varying distribution over binary vectors of high dimensionality, as well as on the Enron email dataset.
AB - This paper describes a methodology for detecting anomalies from sequentially observed and potentially noisy data. The proposed approach consists of two main elements: 1) filtering, or assigning a belief or likelihood to each successive measurement based upon our ability to predict it from previous noisy observations and 2) hedging, or flagging potential anomalies by comparing the current belief against a time-varying and data-adaptive threshold. The threshold is adjusted based on the available feedback from an end user. Our algorithms, which combine universal prediction with recent work on online convex programming, do not require computing posterior distributions given all current observations and involve simple primal-dual parameter updates. At the heart of the proposed approach lie exponential-family models which can be used in a wide variety of contexts and applications, and which yield methods that achieve sublinear per-round regret against both static and slowly varying product distributions with marginals drawn from the same exponential family. Moreover, the regret against static distributions coincides with the minimax value of the corresponding online strongly convex game. We also prove bounds on the number of mistakes made during the hedging step relative to the best offline choice of the threshold with access to all estimated beliefs and feedback signals. We validate the theory on synthetic data drawn from a time-varying distribution over binary vectors of high dimensionality, as well as on the Enron email dataset.
KW - Anomaly detection
KW - exponential families
KW - filtering
KW - individual sequences
KW - label-efficient prediction
KW - minimax regret
KW - online convex programming (OCP)
KW - prediction with limited feedback
KW - sequential probability assignment
KW - universal prediction
UR - http://www.scopus.com/inward/record.url?scp=84863971090&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84863971090&partnerID=8YFLogxK
U2 - 10.1109/TIT.2012.2201375
DO - 10.1109/TIT.2012.2201375
M3 - Article
AN - SCOPUS:84863971090
SN - 0018-9448
VL - 58
SP - 5544
EP - 5562
JO - IEEE Transactions on Information Theory
JF - IEEE Transactions on Information Theory
IS - 8
M1 - 6208875
ER -