Universal outlier hypothesis testing

Yun Li, Sirin Nitinawarat, Venugopal V. Veeravalli

Research output: Contribution to journalArticlepeer-review

Abstract

Outlier hypothesis testing is studied in a universal setting. Multiple sequences of observations are collected, a small subset of which are outliers. A sequence is considered an outlier if the observations in that sequence are distributed according to an outlier distribution, distinct from the typical distribution governing the observations in all the other sequences. Nothing is known about the outlier and typical distributions except that they are distinct and have full supports. The goal is to design a universal test to best discern the outlier sequence(s). For models with exactly one outlier sequence, the generalized likelihood test is shown to be universally exponentially consistent. A single-letter characterization of the error exponent achievable by the test is derived, and it is shown that the test achieves the optimal error exponent asymptotically as the number of sequences approaches infinity. When the null hypothesis with no outlier is included, a modification of the generalized likelihood test is shown to achieve the same error exponent under each non-null hypothesis, and also consistency under the null hypothesis. Then, models with more than one outlier are studied in the following settings. For the setting with a known number of distinctly distributed outliers, the achievable error exponent of the generalized likelihood test is characterized. The limiting error exponent achieved by such a test is characterized, and the test is shown to be asymptotically exponentially consistent. For the setting with an unknown number of identically distributed outliers, a modification of the generalized likelihood test is shown to achieve a positive error exponent under each non-null hypothesis, and also consistency under the null hypothesis. When the outlier sequences can be distinctly distributed (with their total number being unknown), it is shown that a universally exponentially consistent test cannot exist, even when the typical distribution is known and the null hypothesis is excluded.

Original languageEnglish (US)
Article number6799184
Pages (from-to)4066-4081
Number of pages16
JournalIEEE Transactions on Information Theory
Volume60
Issue number7
DOIs
StatePublished - Jul 2014

Keywords

  • Anomaly detection
  • big data
  • classification
  • consistency
  • exponential consistency
  • fraud detection
  • generalized likelihood test
  • outlier detection

ASJC Scopus subject areas

  • Information Systems
  • Computer Science Applications
  • Library and Information Sciences

Fingerprint

Dive into the research topics of 'Universal outlier hypothesis testing'. Together they form a unique fingerprint.

Cite this