TY - JOUR

T1 - Universal sequential outlier hypothesis testing

AU - Li, Yun

AU - Nitinawarat, Sirin

AU - Veeravalli, Venugopal V.

N1 - Funding Information:
This work was supported by the Air Force O ce of Scientific Research (AFOSR) under Grant FA9550-10-1-0458 and by the National Science Foundation under grants CCF 11-11342 and CCF 16-18658, through the University of Illinois at Urbana–Champaign.
Publisher Copyright:
© 2017 Taylor & Francis.
Copyright:
Copyright 2017 Elsevier B.V., All rights reserved.

PY - 2017/7/3

Y1 - 2017/7/3

N2 - Universal outlier hypothesis testing is studied in a sequential setting. Multiple observation sequences are collected, a small subset of which are outliers. A sequence is considered an outlier if the observations in that sequence are generated by an “outlier” distribution, distinct from a common “typical” distribution governing the majority of the sequences. Apart from being distinct, the outlier and typical distributions can be arbitrarily close. The goal is to design a universal test to best discern all the outlier sequences. A universal test with the flavor of the repeated significance test is proposed and its asymptotic performance, as the error probability goes to zero, is characterized under various universal settings. The proposed test is shown to be universally consistent. For the model with at most one outlier, conditioned on the outlier being present, the test is shown to be asymptotically optimal universally when the typical distribution is known and as the number of sequences goes to infinity when neither the outlier nor the typical distribution is known. With multiple identical outliers, the test is shown to be asymptotically optimal universally when the number of outliers is the largest possible and with the typical distribution being known, and its asymptotic performance with neither the outlier nor the typical distribution being known is also characterized. Extensions of the findings to models with multiple distinct outliers are also discussed. In all cases, it is shown that the asymptotic performance guarantees for the proposed test when neither the outlier nor the typical distribution is known converge to those when the typical distribution is known as the number of sequences goes to infinity.

AB - Universal outlier hypothesis testing is studied in a sequential setting. Multiple observation sequences are collected, a small subset of which are outliers. A sequence is considered an outlier if the observations in that sequence are generated by an “outlier” distribution, distinct from a common “typical” distribution governing the majority of the sequences. Apart from being distinct, the outlier and typical distributions can be arbitrarily close. The goal is to design a universal test to best discern all the outlier sequences. A universal test with the flavor of the repeated significance test is proposed and its asymptotic performance, as the error probability goes to zero, is characterized under various universal settings. The proposed test is shown to be universally consistent. For the model with at most one outlier, conditioned on the outlier being present, the test is shown to be asymptotically optimal universally when the typical distribution is known and as the number of sequences goes to infinity when neither the outlier nor the typical distribution is known. With multiple identical outliers, the test is shown to be asymptotically optimal universally when the number of outliers is the largest possible and with the typical distribution being known, and its asymptotic performance with neither the outlier nor the typical distribution being known is also characterized. Extensions of the findings to models with multiple distinct outliers are also discussed. In all cases, it is shown that the asymptotic performance guarantees for the proposed test when neither the outlier nor the typical distribution is known converge to those when the typical distribution is known as the number of sequences goes to infinity.

KW - Anomaly detection

KW - consistency

KW - data-driven classification

KW - exponential consistency

KW - fraud detection

KW - generalized likelihood test

KW - multihypothesis sequential probability ratio test

KW - nonparametric sequential testing

KW - outlier detection

KW - repeated significance test

UR - http://www.scopus.com/inward/record.url?scp=85029921551&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85029921551&partnerID=8YFLogxK

U2 - 10.1080/07474946.2017.1360086

DO - 10.1080/07474946.2017.1360086

M3 - Article

AN - SCOPUS:85029921551

VL - 36

SP - 309

EP - 344

JO - Sequential Analysis

JF - Sequential Analysis

SN - 0747-4946

IS - 3

ER -