TY - JOUR
T1 - Detection of complexes in biological networks through diversified dense subgraph mining
AU - Ma, Xiuli
AU - Zhou, Guangyu
AU - Shang, Jingbo
AU - Wang, Jingjing
AU - Peng, Jian
AU - Han, Jiawei
N1 - Funding Information:
This work is supported by the National Natural Science Foundation of China [61103025 to X.M.] and the China Scholarship Council [201308110276 to X.M.]. This work was sponsored, in part, by the National Science Foundation [IIS-1017362, IIS-1320617, and IIS-1354329 to J.H.] and NIGMS through funds provided by the trans-NIH Big Data to Knowledge (BD2K) initiative (www.bd2k.nih.gov) [1U54GM114838 to J.H.].
Publisher Copyright:
© 2017, Mary Ann Liebert, Inc.
PY - 2017/9
Y1 - 2017/9
N2 - Protein-protein interaction (PPI) networks, providing a comprehensive landscape of protein interaction patterns, enable us to explore biological processes and cellular components at multiple resolutions. For a biological process, a number of proteins need to work together to perform a job. Proteins densely interact with each other, forming large molecular machines or cellular building blocks. Identification of such densely interconnected clusters or protein complexes from PPI networks enables us to obtain a better understanding of the hierarchy and organization of biological processes and cellular components. However, most existing graph clustering algorithms on PPI networks often cannot effectively detect densely connected subgraphs and overlapped subgraphs. In this article, we formulate the problem of complex detection as diversified dense subgraph mining and introduce a novel approximation algorithm to efficiently enumerate putative protein complexes from biological networks. The key insight of our algorithm is that instead of enumerating all dense subgraphs, we only need to find a small diverse subset of subgraphs that cover as many proteins as possible. The problem is modeled as finding a diverse set of maximal dense subgraphs where we develop highly effective pruning techniques to guarantee efficiency. To scale up to large networks, we devise a divide-and-conquer approach to speed up the algorithm in a distributed manner. By comparing with existing clustering and dense subgraph-based algorithms on several yeast and human PPI networks, we demonstrate that our method can detect more putative protein complexes and achieves better prediction accuracy.∗
AB - Protein-protein interaction (PPI) networks, providing a comprehensive landscape of protein interaction patterns, enable us to explore biological processes and cellular components at multiple resolutions. For a biological process, a number of proteins need to work together to perform a job. Proteins densely interact with each other, forming large molecular machines or cellular building blocks. Identification of such densely interconnected clusters or protein complexes from PPI networks enables us to obtain a better understanding of the hierarchy and organization of biological processes and cellular components. However, most existing graph clustering algorithms on PPI networks often cannot effectively detect densely connected subgraphs and overlapped subgraphs. In this article, we formulate the problem of complex detection as diversified dense subgraph mining and introduce a novel approximation algorithm to efficiently enumerate putative protein complexes from biological networks. The key insight of our algorithm is that instead of enumerating all dense subgraphs, we only need to find a small diverse subset of subgraphs that cover as many proteins as possible. The problem is modeled as finding a diverse set of maximal dense subgraphs where we develop highly effective pruning techniques to guarantee efficiency. To scale up to large networks, we devise a divide-and-conquer approach to speed up the algorithm in a distributed manner. By comparing with existing clustering and dense subgraph-based algorithms on several yeast and human PPI networks, we demonstrate that our method can detect more putative protein complexes and achieves better prediction accuracy.∗
KW - complexes detection
KW - dense subgraphs detection
KW - diversification
KW - graph clustering
UR - http://www.scopus.com/inward/record.url?scp=85029347983&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85029347983&partnerID=8YFLogxK
U2 - 10.1089/cmb.2017.0037
DO - 10.1089/cmb.2017.0037
M3 - Article
C2 - 28570104
AN - SCOPUS:85029347983
SN - 1066-5277
VL - 24
SP - 923
EP - 941
JO - Journal of Computational Biology
JF - Journal of Computational Biology
IS - 9
ER -