TY - JOUR
T1 - Submatrix localization via message passing
AU - Hajek, Bruce
AU - Wu, Yihong
AU - Xu, Jiaming
N1 - We would like to acknowledge support for this project from the National Science Foundation (NSF grants IIS-9988642, CCF 14-09106, CCF-1527105, CCF-1755960, OIS 18-23145, and a CAREER award CCF-1651588) and the Multidisciplinary Research Program of the Department of Defense (MURI N00014-00-1-0637).
PY - 2018/4/1
Y1 - 2018/4/1
N2 - The principal submatrix localization problem deals with recovering a K ×K principal submatrix of elevated mean µ in a large n × n symmetric matrix subject to additive standard Gaussian noise, or more generally, mean zero, variance one, subgaussian noise. This problem serves as a prototypical example for community detection, in which the community corresponds to the support of the submatrix. The main result of this paper is that in the regime Ω(n) ≤ K ≤ o(n), the support of the submatrix can be weakly recovered (with o(K) misclassification errors on average) by an optimized message passing algorithm if λ = µ2K2/n, the signal-to-noise ratio, exceeds 1/e. This extends a result by Deshpande and Montanari previously obtained for K = Θ(n) and µ = Θ(1). In addition, the algorithm can be combined with a voting procedure to achieve the information-theoretic limit of exact recovery with sharp constants for all K ≥log n n(8 1 e + o(1)). The total running time of the algorithm is O(n2 log n). Another version of the submatrix localization problem, known as noisy biclustering, aims to recover a K1 × K2 submatrix of elevated mean µ in a large n1 × n2 Gaussian matrix. The optimized message passing algorithm and its analysis are adapted to the bicluster problem assuming Ω(ni) ≤ Ki ≤ o(ni) and K1 K2. A sharp information-theoretic condition for the weak recovery of both clusters is also identified.
AB - The principal submatrix localization problem deals with recovering a K ×K principal submatrix of elevated mean µ in a large n × n symmetric matrix subject to additive standard Gaussian noise, or more generally, mean zero, variance one, subgaussian noise. This problem serves as a prototypical example for community detection, in which the community corresponds to the support of the submatrix. The main result of this paper is that in the regime Ω(n) ≤ K ≤ o(n), the support of the submatrix can be weakly recovered (with o(K) misclassification errors on average) by an optimized message passing algorithm if λ = µ2K2/n, the signal-to-noise ratio, exceeds 1/e. This extends a result by Deshpande and Montanari previously obtained for K = Θ(n) and µ = Θ(1). In addition, the algorithm can be combined with a voting procedure to achieve the information-theoretic limit of exact recovery with sharp constants for all K ≥log n n(8 1 e + o(1)). The total running time of the algorithm is O(n2 log n). Another version of the submatrix localization problem, known as noisy biclustering, aims to recover a K1 × K2 submatrix of elevated mean µ in a large n1 × n2 Gaussian matrix. The optimized message passing algorithm and its analysis are adapted to the bicluster problem assuming Ω(ni) ≤ Ki ≤ o(ni) and K1 K2. A sharp information-theoretic condition for the weak recovery of both clusters is also identified.
KW - Biclustering
KW - High-dimensional statistics
KW - Message passing
KW - Spectral algorithms computational complexity
KW - Submatrix localization
UR - https://www.scopus.com/pages/publications/85048777703
UR - https://www.scopus.com/inward/citedby.url?scp=85048777703&partnerID=8YFLogxK
M3 - Article
AN - SCOPUS:85048777703
SN - 1532-4435
VL - 18
SP - 1
EP - 52
JO - Journal of Machine Learning Research
JF - Journal of Machine Learning Research
ER -