### Abstract

We study a semidefinite programming (SDP) relaxation of the maximum likelihood estimation for exactly recovering a hidden community of cardinality K from an n × n symmetric data matrix A, where for distinct indices i, j, A_{ij} ∼ P if i, j are both in the community and A_{ij} ∼ Q otherwise, for two known probability distributions P and Q. We identify a sufficient condition and a necessary condition for the success of SDP for the general model. For both the Bernoulli case (P = Bern(p) and Q = Bern(q) with p > q) and the Gaussian case (P = N(µ, 1) and Q = N(0, 1) with µ > 0), which correspond to the problem of planted dense subgraph recovery and submatrix localization respectively, the general results lead to the following findings: (1) If K = ω(n/log n), SDP attains the information-theoretic recovery limits with sharp constants; (2) If K = Θ(n/log n), SDP is order-wise optimal, but strictly suboptimal by a constant factor; (3) If K = o(n/log n) and K → ∞, SDP is order-wise suboptimal. The same critical scaling for K is found to hold, up to constant factors, for the performance of SDP on the stochastic block model of n vertices partitioned into multiple communities of equal size K. A key ingredient in the proof of the necessary condition is a construction of a primal feasible solution based on random perturbation of the true cluster matrix.

Original language | English (US) |
---|---|

Pages (from-to) | 1051-1095 |

Number of pages | 45 |

Journal | Journal of Machine Learning Research |

Volume | 49 |

Issue number | June |

State | Published - Jun 6 2016 |

Event | 29th Conference on Learning Theory, COLT 2016 - New York, United States Duration: Jun 23 2016 → Jun 26 2016 |

### Fingerprint

### Keywords

- Planted dense subgraph recovery
- Semidefinite programming relaxations
- Stochastic block model
- Submatrix localization

### ASJC Scopus subject areas

- Software
- Control and Systems Engineering
- Statistics and Probability
- Artificial Intelligence

### Cite this

*Journal of Machine Learning Research*,

*49*(June), 1051-1095.

**Semidefinite programs for exact recovery of a hidden community.** / Hajek, Bruce; Wu, Yihong; Xu, Jiaming.

Research output: Contribution to journal › Conference article

*Journal of Machine Learning Research*, vol. 49, no. June, pp. 1051-1095.

}

TY - JOUR

T1 - Semidefinite programs for exact recovery of a hidden community

AU - Hajek, Bruce

AU - Wu, Yihong

AU - Xu, Jiaming

PY - 2016/6/6

Y1 - 2016/6/6

N2 - We study a semidefinite programming (SDP) relaxation of the maximum likelihood estimation for exactly recovering a hidden community of cardinality K from an n × n symmetric data matrix A, where for distinct indices i, j, Aij ∼ P if i, j are both in the community and Aij ∼ Q otherwise, for two known probability distributions P and Q. We identify a sufficient condition and a necessary condition for the success of SDP for the general model. For both the Bernoulli case (P = Bern(p) and Q = Bern(q) with p > q) and the Gaussian case (P = N(µ, 1) and Q = N(0, 1) with µ > 0), which correspond to the problem of planted dense subgraph recovery and submatrix localization respectively, the general results lead to the following findings: (1) If K = ω(n/log n), SDP attains the information-theoretic recovery limits with sharp constants; (2) If K = Θ(n/log n), SDP is order-wise optimal, but strictly suboptimal by a constant factor; (3) If K = o(n/log n) and K → ∞, SDP is order-wise suboptimal. The same critical scaling for K is found to hold, up to constant factors, for the performance of SDP on the stochastic block model of n vertices partitioned into multiple communities of equal size K. A key ingredient in the proof of the necessary condition is a construction of a primal feasible solution based on random perturbation of the true cluster matrix.

AB - We study a semidefinite programming (SDP) relaxation of the maximum likelihood estimation for exactly recovering a hidden community of cardinality K from an n × n symmetric data matrix A, where for distinct indices i, j, Aij ∼ P if i, j are both in the community and Aij ∼ Q otherwise, for two known probability distributions P and Q. We identify a sufficient condition and a necessary condition for the success of SDP for the general model. For both the Bernoulli case (P = Bern(p) and Q = Bern(q) with p > q) and the Gaussian case (P = N(µ, 1) and Q = N(0, 1) with µ > 0), which correspond to the problem of planted dense subgraph recovery and submatrix localization respectively, the general results lead to the following findings: (1) If K = ω(n/log n), SDP attains the information-theoretic recovery limits with sharp constants; (2) If K = Θ(n/log n), SDP is order-wise optimal, but strictly suboptimal by a constant factor; (3) If K = o(n/log n) and K → ∞, SDP is order-wise suboptimal. The same critical scaling for K is found to hold, up to constant factors, for the performance of SDP on the stochastic block model of n vertices partitioned into multiple communities of equal size K. A key ingredient in the proof of the necessary condition is a construction of a primal feasible solution based on random perturbation of the true cluster matrix.

KW - Planted dense subgraph recovery

KW - Semidefinite programming relaxations

KW - Stochastic block model

KW - Submatrix localization

UR - http://www.scopus.com/inward/record.url?scp=85072253438&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85072253438&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:85072253438

VL - 49

SP - 1051

EP - 1095

JO - Journal of Machine Learning Research

JF - Journal of Machine Learning Research

SN - 1532-4435

IS - June

ER -