TY - GEN
T1 - Robust convex clustering analysis
AU - Wang, Qi
AU - Gong, Pinghua
AU - Chang, Shiyu
AU - Huang, Thomas S.
AU - Zhou, Jiayu
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2016/7/2
Y1 - 2016/7/2
N2 - Clustering is an unsupervised learning approach that explores data and seeks groups of similar objects. Many classical clustering models such as k-means and DBSCAN are based on heuristics algorithms and suffer from local optimal solutions and numerical instability. Recently convex clustering has received increasing attentions, which leverages the sparsity inducing norms and enjoys many attractive theoretical properties. However, convex clustering is based on Euclidean distance and is thus not robust against outlier features. Since the outlier features are very common especially when dimensionality is high, the vulnerability has greatly limited the applicability of convex clustering to analyze many real-world datasets. In this paper, we address the challenge by proposing a novel robust convex clustering method that simultaneously performs convex clustering and identifies outlier features. Specifically, the proposed method learns to decompose the data matrix into a clustering structure component and a group sparse component that captures feature outliers. We develop a block coordinate descent algorithm which iteratively performs convex clustering after outliers features are identified and eliminated. We also propose an efficient algorithm for solving the convex clustering by exploiting the structures on its dual problem. Moreover, to further illustrate the statistical stability, we present the theoretical performance bound of the proposed clustering method. Empirical studies on synthetic data and real-world data demonstrate that the proposed robust convex clustering can detect feature outliers as well as improve cluster quality.
AB - Clustering is an unsupervised learning approach that explores data and seeks groups of similar objects. Many classical clustering models such as k-means and DBSCAN are based on heuristics algorithms and suffer from local optimal solutions and numerical instability. Recently convex clustering has received increasing attentions, which leverages the sparsity inducing norms and enjoys many attractive theoretical properties. However, convex clustering is based on Euclidean distance and is thus not robust against outlier features. Since the outlier features are very common especially when dimensionality is high, the vulnerability has greatly limited the applicability of convex clustering to analyze many real-world datasets. In this paper, we address the challenge by proposing a novel robust convex clustering method that simultaneously performs convex clustering and identifies outlier features. Specifically, the proposed method learns to decompose the data matrix into a clustering structure component and a group sparse component that captures feature outliers. We develop a block coordinate descent algorithm which iteratively performs convex clustering after outliers features are identified and eliminated. We also propose an efficient algorithm for solving the convex clustering by exploiting the structures on its dual problem. Moreover, to further illustrate the statistical stability, we present the theoretical performance bound of the proposed clustering method. Empirical studies on synthetic data and real-world data demonstrate that the proposed robust convex clustering can detect feature outliers as well as improve cluster quality.
UR - http://www.scopus.com/inward/record.url?scp=85014539983&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85014539983&partnerID=8YFLogxK
U2 - 10.1109/ICDM.2016.123
DO - 10.1109/ICDM.2016.123
M3 - Conference contribution
AN - SCOPUS:85014539983
T3 - Proceedings - IEEE International Conference on Data Mining, ICDM
SP - 1263
EP - 1268
BT - Proceedings - 16th IEEE International Conference on Data Mining, ICDM 2016
A2 - Bonchi, Francesco
A2 - Domingo-Ferrer, Josep
A2 - Baeza-Yates, Ricardo
A2 - Zhou, Zhi-Hua
A2 - Wu, Xindong
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 16th IEEE International Conference on Data Mining, ICDM 2016
Y2 - 12 December 2016 through 15 December 2016
ER -