TY - GEN
T1 - Outlying sequence detection in large datasets
T2 - 41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016
AU - Li, Yun
AU - Veeravalli, Venugopal V.
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2016/5/18
Y1 - 2016/5/18
N2 - Multiple observation sequences are collected, among which there is a small subset of outliers. A sequence is considered an outlier if the observations therein are generated by a mechanism different from that generating the observations in the majority of sequences. In the universal setting, the goal is to identify all the outliers without any knowledge about the underlying generating mechanisms. In prior work, this problem was studied as a universal hypothesis testing problem, and a generalized likelihood test was constructed and its asymptotic performance characterized. Here a connection is made between the generalized likelihood test and clustering algorithms from machine learning. It is shown that the generalized likelihood test is equivalent to combinatorial clustering over the probability simplex with the Kullback-Leibler divergence being the dissimilarity measure. Applied to synthetic data sets for outlier hypothesis testing, the performance of the generalized likelihood test is shown to be superior to that of a number of other clustering algorithms for sufficiently large sample sizes.
AB - Multiple observation sequences are collected, among which there is a small subset of outliers. A sequence is considered an outlier if the observations therein are generated by a mechanism different from that generating the observations in the majority of sequences. In the universal setting, the goal is to identify all the outliers without any knowledge about the underlying generating mechanisms. In prior work, this problem was studied as a universal hypothesis testing problem, and a generalized likelihood test was constructed and its asymptotic performance characterized. Here a connection is made between the generalized likelihood test and clustering algorithms from machine learning. It is shown that the generalized likelihood test is equivalent to combinatorial clustering over the probability simplex with the Kullback-Leibler divergence being the dissimilarity measure. Applied to synthetic data sets for outlier hypothesis testing, the performance of the generalized likelihood test is shown to be superior to that of a number of other clustering algorithms for sufficiently large sample sizes.
KW - cluster analysis
KW - combinatorial clustering
KW - generalized likelihood test
KW - outlying sequence detection
KW - spectral clustering
KW - universal outlier hypothesis testing
UR - http://www.scopus.com/inward/record.url?scp=84973281914&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84973281914&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2016.7472865
DO - 10.1109/ICASSP.2016.7472865
M3 - Conference contribution
AN - SCOPUS:84973281914
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 6180
EP - 6184
BT - 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 20 March 2016 through 25 March 2016
ER -