Semidefinite programs for exact recovery of a hidden community

Bruce Hajek, Yihong Wu, Jiaming Xu

Research output: Contribution to journalConference article

Abstract

We study a semidefinite programming (SDP) relaxation of the maximum likelihood estimation for exactly recovering a hidden community of cardinality K from an n × n symmetric data matrix A, where for distinct indices i, j, Aij ∼ P if i, j are both in the community and Aij ∼ Q otherwise, for two known probability distributions P and Q. We identify a sufficient condition and a necessary condition for the success of SDP for the general model. For both the Bernoulli case (P = Bern(p) and Q = Bern(q) with p > q) and the Gaussian case (P = N(µ, 1) and Q = N(0, 1) with µ > 0), which correspond to the problem of planted dense subgraph recovery and submatrix localization respectively, the general results lead to the following findings: (1) If K = ω(n/log n), SDP attains the information-theoretic recovery limits with sharp constants; (2) If K = Θ(n/log n), SDP is order-wise optimal, but strictly suboptimal by a constant factor; (3) If K = o(n/log n) and K → ∞, SDP is order-wise suboptimal. The same critical scaling for K is found to hold, up to constant factors, for the performance of SDP on the stochastic block model of n vertices partitioned into multiple communities of equal size K. A key ingredient in the proof of the necessary condition is a construction of a primal feasible solution based on random perturbation of the true cluster matrix.

Original languageEnglish (US)
Pages (from-to)1051-1095
Number of pages45
JournalJournal of Machine Learning Research
Volume49
Issue numberJune
StatePublished - Jun 6 2016
Event29th Conference on Learning Theory, COLT 2016 - New York, United States
Duration: Jun 23 2016Jun 26 2016

Fingerprint

Semidefinite Program
Semidefinite Programming
Recovery
Maximum likelihood estimation
Probability distributions
Semidefinite Programming Relaxation
Sharp Constants
Necessary Conditions
Random Perturbation
Bernoulli
Maximum Likelihood Estimation
Subgraph
Cardinality
Probability Distribution
Strictly
Community
Scaling
Distinct
Sufficient Conditions
Model

Keywords

  • Planted dense subgraph recovery
  • Semidefinite programming relaxations
  • Stochastic block model
  • Submatrix localization

ASJC Scopus subject areas

  • Software
  • Control and Systems Engineering
  • Statistics and Probability
  • Artificial Intelligence

Cite this

Semidefinite programs for exact recovery of a hidden community. / Hajek, Bruce; Wu, Yihong; Xu, Jiaming.

In: Journal of Machine Learning Research, Vol. 49, No. June, 06.06.2016, p. 1051-1095.

Research output: Contribution to journalConference article

Hajek, Bruce ; Wu, Yihong ; Xu, Jiaming. / Semidefinite programs for exact recovery of a hidden community. In: Journal of Machine Learning Research. 2016 ; Vol. 49, No. June. pp. 1051-1095.
@article{e86c6c373bf9484a861530abef1cb4a5,
title = "Semidefinite programs for exact recovery of a hidden community",
abstract = "We study a semidefinite programming (SDP) relaxation of the maximum likelihood estimation for exactly recovering a hidden community of cardinality K from an n × n symmetric data matrix A, where for distinct indices i, j, Aij ∼ P if i, j are both in the community and Aij ∼ Q otherwise, for two known probability distributions P and Q. We identify a sufficient condition and a necessary condition for the success of SDP for the general model. For both the Bernoulli case (P = Bern(p) and Q = Bern(q) with p > q) and the Gaussian case (P = N(µ, 1) and Q = N(0, 1) with µ > 0), which correspond to the problem of planted dense subgraph recovery and submatrix localization respectively, the general results lead to the following findings: (1) If K = ω(n/log n), SDP attains the information-theoretic recovery limits with sharp constants; (2) If K = Θ(n/log n), SDP is order-wise optimal, but strictly suboptimal by a constant factor; (3) If K = o(n/log n) and K → ∞, SDP is order-wise suboptimal. The same critical scaling for K is found to hold, up to constant factors, for the performance of SDP on the stochastic block model of n vertices partitioned into multiple communities of equal size K. A key ingredient in the proof of the necessary condition is a construction of a primal feasible solution based on random perturbation of the true cluster matrix.",
keywords = "Planted dense subgraph recovery, Semidefinite programming relaxations, Stochastic block model, Submatrix localization",
author = "Bruce Hajek and Yihong Wu and Jiaming Xu",
year = "2016",
month = "6",
day = "6",
language = "English (US)",
volume = "49",
pages = "1051--1095",
journal = "Journal of Machine Learning Research",
issn = "1532-4435",
publisher = "Microtome Publishing",
number = "June",

}

TY - JOUR

T1 - Semidefinite programs for exact recovery of a hidden community

AU - Hajek, Bruce

AU - Wu, Yihong

AU - Xu, Jiaming

PY - 2016/6/6

Y1 - 2016/6/6

N2 - We study a semidefinite programming (SDP) relaxation of the maximum likelihood estimation for exactly recovering a hidden community of cardinality K from an n × n symmetric data matrix A, where for distinct indices i, j, Aij ∼ P if i, j are both in the community and Aij ∼ Q otherwise, for two known probability distributions P and Q. We identify a sufficient condition and a necessary condition for the success of SDP for the general model. For both the Bernoulli case (P = Bern(p) and Q = Bern(q) with p > q) and the Gaussian case (P = N(µ, 1) and Q = N(0, 1) with µ > 0), which correspond to the problem of planted dense subgraph recovery and submatrix localization respectively, the general results lead to the following findings: (1) If K = ω(n/log n), SDP attains the information-theoretic recovery limits with sharp constants; (2) If K = Θ(n/log n), SDP is order-wise optimal, but strictly suboptimal by a constant factor; (3) If K = o(n/log n) and K → ∞, SDP is order-wise suboptimal. The same critical scaling for K is found to hold, up to constant factors, for the performance of SDP on the stochastic block model of n vertices partitioned into multiple communities of equal size K. A key ingredient in the proof of the necessary condition is a construction of a primal feasible solution based on random perturbation of the true cluster matrix.

AB - We study a semidefinite programming (SDP) relaxation of the maximum likelihood estimation for exactly recovering a hidden community of cardinality K from an n × n symmetric data matrix A, where for distinct indices i, j, Aij ∼ P if i, j are both in the community and Aij ∼ Q otherwise, for two known probability distributions P and Q. We identify a sufficient condition and a necessary condition for the success of SDP for the general model. For both the Bernoulli case (P = Bern(p) and Q = Bern(q) with p > q) and the Gaussian case (P = N(µ, 1) and Q = N(0, 1) with µ > 0), which correspond to the problem of planted dense subgraph recovery and submatrix localization respectively, the general results lead to the following findings: (1) If K = ω(n/log n), SDP attains the information-theoretic recovery limits with sharp constants; (2) If K = Θ(n/log n), SDP is order-wise optimal, but strictly suboptimal by a constant factor; (3) If K = o(n/log n) and K → ∞, SDP is order-wise suboptimal. The same critical scaling for K is found to hold, up to constant factors, for the performance of SDP on the stochastic block model of n vertices partitioned into multiple communities of equal size K. A key ingredient in the proof of the necessary condition is a construction of a primal feasible solution based on random perturbation of the true cluster matrix.

KW - Planted dense subgraph recovery

KW - Semidefinite programming relaxations

KW - Stochastic block model

KW - Submatrix localization

UR - http://www.scopus.com/inward/record.url?scp=85072253438&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85072253438&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:85072253438

VL - 49

SP - 1051

EP - 1095

JO - Journal of Machine Learning Research

JF - Journal of Machine Learning Research

SN - 1532-4435

IS - June

ER -