Abstract
Outlier hypothesis testing is studied in a universal setting. Multiple sequences of observations are collected, a small subset of which are outliers. A sequence is considered an outlier if the observations in that sequence are distributed according to an outlier distribution, distinct from the typical distribution governing the observations in all the other sequences. Nothing is known about the outlier and typical distributions except that they are distinct and have full supports. The goal is to design a universal test to best discern the outlier sequence(s). For models with exactly one outlier sequence, the generalized likelihood test is shown to be universally exponentially consistent. A single-letter characterization of the error exponent achievable by the test is derived, and it is shown that the test achieves the optimal error exponent asymptotically as the number of sequences approaches infinity. When the null hypothesis with no outlier is included, a modification of the generalized likelihood test is shown to achieve the same error exponent under each non-null hypothesis, and also consistency under the null hypothesis. Then, models with more than one outlier are studied in the following settings. For the setting with a known number of distinctly distributed outliers, the achievable error exponent of the generalized likelihood test is characterized. The limiting error exponent achieved by such a test is characterized, and the test is shown to be asymptotically exponentially consistent. For the setting with an unknown number of identically distributed outliers, a modification of the generalized likelihood test is shown to achieve a positive error exponent under each non-null hypothesis, and also consistency under the null hypothesis. When the outlier sequences can be distinctly distributed (with their total number being unknown), it is shown that a universally exponentially consistent test cannot exist, even when the typical distribution is known and the null hypothesis is excluded.
Original language | English (US) |
---|---|
Article number | 6799184 |
Pages (from-to) | 4066-4081 |
Number of pages | 16 |
Journal | IEEE Transactions on Information Theory |
Volume | 60 |
Issue number | 7 |
DOIs | |
State | Published - Jul 2014 |
Keywords
- Anomaly detection
- big data
- classification
- consistency
- exponential consistency
- fraud detection
- generalized likelihood test
- outlier detection
ASJC Scopus subject areas
- Information Systems
- Computer Science Applications
- Library and Information Sciences