TY - JOUR
T1 - Diffusion K-means clustering on manifolds
T2 - Provable exact recovery via semidefinite relaxations
AU - Chen, Xiaohui
AU - Yang, Yun
N1 - Funding Information:
X. Chen's research is supported in part by NSF DMS-1404891 , NSF CAREER Award DMS-1752614 , UIUC Research Board Awards ( RB17092 , RB18099 ), and a Simons Fellowship (Award Number: 663673 ). Y. Yang's research is supported in part by NSF DMS-1810831 . This work is completed in part with the high-performance computing resource provided by the Illinois Campus Cluster Program at UIUC. Authors are listed in alphabetical order.
Publisher Copyright:
© 2020 Elsevier Inc.
PY - 2021/5
Y1 - 2021/5
N2 - We introduce the diffusion K-means clustering method on Riemannian submanifolds, which maximizes the within-cluster connectedness based on the diffusion distance. The diffusion K-means constructs a random walk on the similarity graph with vertices as data points randomly sampled on the manifolds and edges as similarities given by a kernel that captures the local geometry of manifolds. The diffusion K-means is a multi-scale clustering tool that is suitable for data with non-linear and non-Euclidean geometric features in mixed dimensions. Given the number of clusters, we propose a polynomial-time convex relaxation algorithm via the semidefinite programming (SDP) to solve the diffusion K-means. In addition, we also propose a nuclear norm regularized SDP that is adaptive to the number of clusters. In both cases, we show that exact recovery of the SDPs for diffusion K-means can be achieved under suitable between-cluster separability and within-cluster connectedness of the submanifolds, which together quantify the hardness of the manifold clustering problem. We further propose the localized diffusion K-means by using the local adaptive bandwidth estimated from the nearest neighbors. We show that exact recovery of the localized diffusion K-means is fully adaptive to the local probability density and geometric structures of the underlying submanifolds.
AB - We introduce the diffusion K-means clustering method on Riemannian submanifolds, which maximizes the within-cluster connectedness based on the diffusion distance. The diffusion K-means constructs a random walk on the similarity graph with vertices as data points randomly sampled on the manifolds and edges as similarities given by a kernel that captures the local geometry of manifolds. The diffusion K-means is a multi-scale clustering tool that is suitable for data with non-linear and non-Euclidean geometric features in mixed dimensions. Given the number of clusters, we propose a polynomial-time convex relaxation algorithm via the semidefinite programming (SDP) to solve the diffusion K-means. In addition, we also propose a nuclear norm regularized SDP that is adaptive to the number of clusters. In both cases, we show that exact recovery of the SDPs for diffusion K-means can be achieved under suitable between-cluster separability and within-cluster connectedness of the submanifolds, which together quantify the hardness of the manifold clustering problem. We further propose the localized diffusion K-means by using the local adaptive bandwidth estimated from the nearest neighbors. We show that exact recovery of the localized diffusion K-means is fully adaptive to the local probability density and geometric structures of the underlying submanifolds.
KW - Adaptivity
KW - Diffusion distance
KW - K-means
KW - Laplace-Beltrami operator
KW - Manifold clustering
KW - Mixing times
KW - Random walk on random graphs
KW - Riemannian submanifolds
KW - Semidefinite programming
UR - http://www.scopus.com/inward/record.url?scp=85081997744&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85081997744&partnerID=8YFLogxK
U2 - 10.1016/j.acha.2020.03.002
DO - 10.1016/j.acha.2020.03.002
M3 - Article
AN - SCOPUS:85081997744
VL - 52
SP - 303
EP - 347
JO - Applied and Computational Harmonic Analysis
JF - Applied and Computational Harmonic Analysis
SN - 1063-5203
ER -