## Abstract

Universal outlier hypothesis testing is studied in a sequential setting. Multiple observation sequences are collected, a small subset of which are outliers. A sequence is considered an outlier if the observations in that sequence are generated by an “outlier” distribution, distinct from a common “typical” distribution governing the majority of the sequences. Apart from being distinct, the outlier and typical distributions can be arbitrarily close. The goal is to design a universal test to best discern all the outlier sequences. A universal test with the flavor of the repeated significance test is proposed and its asymptotic performance, as the error probability goes to zero, is characterized under various universal settings. The proposed test is shown to be universally consistent. For the model with at most one outlier, conditioned on the outlier being present, the test is shown to be asymptotically optimal universally when the typical distribution is known and as the number of sequences goes to infinity when neither the outlier nor the typical distribution is known. With multiple identical outliers, the test is shown to be asymptotically optimal universally when the number of outliers is the largest possible and with the typical distribution being known, and its asymptotic performance with neither the outlier nor the typical distribution being known is also characterized. Extensions of the findings to models with multiple distinct outliers are also discussed. In all cases, it is shown that the asymptotic performance guarantees for the proposed test when neither the outlier nor the typical distribution is known converge to those when the typical distribution is known as the number of sequences goes to infinity.

Original language | English (US) |
---|---|

Pages (from-to) | 309-344 |

Number of pages | 36 |

Journal | Sequential Analysis |

Volume | 36 |

Issue number | 3 |

DOIs | |

State | Published - Jul 3 2017 |

## Keywords

- Anomaly detection
- consistency
- data-driven classification
- exponential consistency
- fraud detection
- generalized likelihood test
- multihypothesis sequential probability ratio test
- nonparametric sequential testing
- outlier detection
- repeated significance test

## ASJC Scopus subject areas

- Statistics and Probability
- Modeling and Simulation