TY - JOUR
T1 - Semidefinite programs for exact recovery of a hidden community
AU - Hajek, Bruce
AU - Wu, Yihong
AU - Xu, Jiaming
N1 - This research was supported by the National Science Foundation under Grant CCF 14-09106, IIS-1447879, NSF OIS 13-39388, and CCF 14-23088, and Strategic Research Initiative on Big-Data Analytics of the College of Engineering at the University of Illinois, and DOD ONR Grant N00014-14-1-0823, and Grant 328025 from the Simons Foundation. This work was done in part while J. Xu was visiting the Simons Institute for the Theory of Computing.
PY - 2016/6/6
Y1 - 2016/6/6
N2 - We study a semidefinite programming (SDP) relaxation of the maximum likelihood estimation for exactly recovering a hidden community of cardinality K from an n × n symmetric data matrix A, where for distinct indices i, j, Aij ∼ P if i, j are both in the community and Aij ∼ Q otherwise, for two known probability distributions P and Q. We identify a sufficient condition and a necessary condition for the success of SDP for the general model. For both the Bernoulli case (P = Bern(p) and Q = Bern(q) with p > q) and the Gaussian case (P = N(µ, 1) and Q = N(0, 1) with µ > 0), which correspond to the problem of planted dense subgraph recovery and submatrix localization respectively, the general results lead to the following findings: (1) If K = ω(n/log n), SDP attains the information-theoretic recovery limits with sharp constants; (2) If K = Θ(n/log n), SDP is order-wise optimal, but strictly suboptimal by a constant factor; (3) If K = o(n/log n) and K → ∞, SDP is order-wise suboptimal. The same critical scaling for K is found to hold, up to constant factors, for the performance of SDP on the stochastic block model of n vertices partitioned into multiple communities of equal size K. A key ingredient in the proof of the necessary condition is a construction of a primal feasible solution based on random perturbation of the true cluster matrix.
AB - We study a semidefinite programming (SDP) relaxation of the maximum likelihood estimation for exactly recovering a hidden community of cardinality K from an n × n symmetric data matrix A, where for distinct indices i, j, Aij ∼ P if i, j are both in the community and Aij ∼ Q otherwise, for two known probability distributions P and Q. We identify a sufficient condition and a necessary condition for the success of SDP for the general model. For both the Bernoulli case (P = Bern(p) and Q = Bern(q) with p > q) and the Gaussian case (P = N(µ, 1) and Q = N(0, 1) with µ > 0), which correspond to the problem of planted dense subgraph recovery and submatrix localization respectively, the general results lead to the following findings: (1) If K = ω(n/log n), SDP attains the information-theoretic recovery limits with sharp constants; (2) If K = Θ(n/log n), SDP is order-wise optimal, but strictly suboptimal by a constant factor; (3) If K = o(n/log n) and K → ∞, SDP is order-wise suboptimal. The same critical scaling for K is found to hold, up to constant factors, for the performance of SDP on the stochastic block model of n vertices partitioned into multiple communities of equal size K. A key ingredient in the proof of the necessary condition is a construction of a primal feasible solution based on random perturbation of the true cluster matrix.
KW - Planted dense subgraph recovery
KW - Semidefinite programming relaxations
KW - Stochastic block model
KW - Submatrix localization
UR - http://www.scopus.com/inward/record.url?scp=85072253438&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85072253438&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85072253438
SN - 1532-4435
VL - 49
SP - 1051
EP - 1095
JO - Journal of Machine Learning Research
JF - Journal of Machine Learning Research
IS - June
T2 - 29th Conference on Learning Theory, COLT 2016
Y2 - 23 June 2016 through 26 June 2016
ER -