Linear-Time Algorithm for Learning Large-Scale Sparse Graphical Models

Salar Fattahi, Richard Y. Zhang, Somayeh Sojoudi

Research output: Contribution to journalArticle

Abstract

We consider the graphical lasso, a popular optimization problem for learning the sparse representations of high-dimensional datasets, which is well-known to be computationally expensive for large-scale problems. A recent line of results has shown-under mild assumptions-that the sparsity pattern of the graphical lasso estimator can be retrieved by soft-thresholding the sample covariance matrix. Based on this result, a closed-form solution has been obtained that is optimal when the thresholded sample covariance matrix has an acyclic structure. In this paper, we prove an extension of this result to generalized graphical lasso (GGL), where additional sparsity constraints are imposed based on prior knowledge. Furthermore, we describe a recursive closed-form solution for the problem when the thresholded sample covariance matrix is chordal. By building upon this result, we describe a novel Newton-Conjugate Gradient algorithm that can efficiently solve the GGL with general structures. Assuming that the thresholded sample covariance matrix is sparse with a sparse Cholesky factorization, we prove that the algorithm converges to an epsilon -accurate solution in O(nlog (1/epsilon)) time and O(n) memory. The algorithm is highly efficient in practice: we solve instances with as many as 200000 variables to 7-9 digits of accuracy in less than an hour on a standard laptop computer running MATLAB.

Original languageEnglish (US)
Article number8598839
Pages (from-to)12658-12672
Number of pages15
JournalIEEE Access
Volume7
DOIs
StatePublished - Jan 1 2019
Externally publishedYes

Fingerprint

Covariance matrix
Laptop computers
Factorization
MATLAB
Data storage equipment
alachlor

Keywords

  • graphical models
  • numerical algorithms
  • Optimization

ASJC Scopus subject areas

  • Computer Science(all)
  • Materials Science(all)
  • Engineering(all)

Cite this

Linear-Time Algorithm for Learning Large-Scale Sparse Graphical Models. / Fattahi, Salar; Zhang, Richard Y.; Sojoudi, Somayeh.

In: IEEE Access, Vol. 7, 8598839, 01.01.2019, p. 12658-12672.

Research output: Contribution to journalArticle

Fattahi, Salar ; Zhang, Richard Y. ; Sojoudi, Somayeh. / Linear-Time Algorithm for Learning Large-Scale Sparse Graphical Models. In: IEEE Access. 2019 ; Vol. 7. pp. 12658-12672.
@article{a4e4521dc2264daeb98a2e5a67c40e70,
title = "Linear-Time Algorithm for Learning Large-Scale Sparse Graphical Models",
abstract = "We consider the graphical lasso, a popular optimization problem for learning the sparse representations of high-dimensional datasets, which is well-known to be computationally expensive for large-scale problems. A recent line of results has shown-under mild assumptions-that the sparsity pattern of the graphical lasso estimator can be retrieved by soft-thresholding the sample covariance matrix. Based on this result, a closed-form solution has been obtained that is optimal when the thresholded sample covariance matrix has an acyclic structure. In this paper, we prove an extension of this result to generalized graphical lasso (GGL), where additional sparsity constraints are imposed based on prior knowledge. Furthermore, we describe a recursive closed-form solution for the problem when the thresholded sample covariance matrix is chordal. By building upon this result, we describe a novel Newton-Conjugate Gradient algorithm that can efficiently solve the GGL with general structures. Assuming that the thresholded sample covariance matrix is sparse with a sparse Cholesky factorization, we prove that the algorithm converges to an epsilon -accurate solution in O(nlog (1/epsilon)) time and O(n) memory. The algorithm is highly efficient in practice: we solve instances with as many as 200000 variables to 7-9 digits of accuracy in less than an hour on a standard laptop computer running MATLAB.",
keywords = "graphical models, numerical algorithms, Optimization",
author = "Salar Fattahi and Zhang, {Richard Y.} and Somayeh Sojoudi",
year = "2019",
month = "1",
day = "1",
doi = "10.1109/ACCESS.2018.2890583",
language = "English (US)",
volume = "7",
pages = "12658--12672",
journal = "IEEE Access",
issn = "2169-3536",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Linear-Time Algorithm for Learning Large-Scale Sparse Graphical Models

AU - Fattahi, Salar

AU - Zhang, Richard Y.

AU - Sojoudi, Somayeh

PY - 2019/1/1

Y1 - 2019/1/1

N2 - We consider the graphical lasso, a popular optimization problem for learning the sparse representations of high-dimensional datasets, which is well-known to be computationally expensive for large-scale problems. A recent line of results has shown-under mild assumptions-that the sparsity pattern of the graphical lasso estimator can be retrieved by soft-thresholding the sample covariance matrix. Based on this result, a closed-form solution has been obtained that is optimal when the thresholded sample covariance matrix has an acyclic structure. In this paper, we prove an extension of this result to generalized graphical lasso (GGL), where additional sparsity constraints are imposed based on prior knowledge. Furthermore, we describe a recursive closed-form solution for the problem when the thresholded sample covariance matrix is chordal. By building upon this result, we describe a novel Newton-Conjugate Gradient algorithm that can efficiently solve the GGL with general structures. Assuming that the thresholded sample covariance matrix is sparse with a sparse Cholesky factorization, we prove that the algorithm converges to an epsilon -accurate solution in O(nlog (1/epsilon)) time and O(n) memory. The algorithm is highly efficient in practice: we solve instances with as many as 200000 variables to 7-9 digits of accuracy in less than an hour on a standard laptop computer running MATLAB.

AB - We consider the graphical lasso, a popular optimization problem for learning the sparse representations of high-dimensional datasets, which is well-known to be computationally expensive for large-scale problems. A recent line of results has shown-under mild assumptions-that the sparsity pattern of the graphical lasso estimator can be retrieved by soft-thresholding the sample covariance matrix. Based on this result, a closed-form solution has been obtained that is optimal when the thresholded sample covariance matrix has an acyclic structure. In this paper, we prove an extension of this result to generalized graphical lasso (GGL), where additional sparsity constraints are imposed based on prior knowledge. Furthermore, we describe a recursive closed-form solution for the problem when the thresholded sample covariance matrix is chordal. By building upon this result, we describe a novel Newton-Conjugate Gradient algorithm that can efficiently solve the GGL with general structures. Assuming that the thresholded sample covariance matrix is sparse with a sparse Cholesky factorization, we prove that the algorithm converges to an epsilon -accurate solution in O(nlog (1/epsilon)) time and O(n) memory. The algorithm is highly efficient in practice: we solve instances with as many as 200000 variables to 7-9 digits of accuracy in less than an hour on a standard laptop computer running MATLAB.

KW - graphical models

KW - numerical algorithms

KW - Optimization

UR - http://www.scopus.com/inward/record.url?scp=85061331925&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85061331925&partnerID=8YFLogxK

U2 - 10.1109/ACCESS.2018.2890583

DO - 10.1109/ACCESS.2018.2890583

M3 - Article

AN - SCOPUS:85061331925

VL - 7

SP - 12658

EP - 12672

JO - IEEE Access

JF - IEEE Access

SN - 2169-3536

M1 - 8598839

ER -