Outlying sequence detection in large datasets: Comparison of universal hypothesis testing and clustering

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Multiple observation sequences are collected, among which there is a small subset of outliers. A sequence is considered an outlier if the observations therein are generated by a mechanism different from that generating the observations in the majority of sequences. In the universal setting, the goal is to identify all the outliers without any knowledge about the underlying generating mechanisms. In prior work, this problem was studied as a universal hypothesis testing problem, and a generalized likelihood test was constructed and its asymptotic performance characterized. Here a connection is made between the generalized likelihood test and clustering algorithms from machine learning. It is shown that the generalized likelihood test is equivalent to combinatorial clustering over the probability simplex with the Kullback-Leibler divergence being the dissimilarity measure. Applied to synthetic data sets for outlier hypothesis testing, the performance of the generalized likelihood test is shown to be superior to that of a number of other clustering algorithms for sufficiently large sample sizes.

Original languageEnglish (US)
Title of host publication2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages6180-6184
Number of pages5
ISBN (Electronic)9781479999880
DOIs
StatePublished - May 18 2016
Event41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Shanghai, China
Duration: Mar 20 2016Mar 25 2016

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume2016-May
ISSN (Print)1520-6149

Other

Other41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016
Country/TerritoryChina
CityShanghai
Period3/20/163/25/16

Keywords

  • cluster analysis
  • combinatorial clustering
  • generalized likelihood test
  • outlying sequence detection
  • spectral clustering
  • universal outlier hypothesis testing

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Outlying sequence detection in large datasets: Comparison of universal hypothesis testing and clustering'. Together they form a unique fingerprint.

Cite this