TY - JOUR
T1 - ProSNet
T2 - 22nd Pacific Symposium on Biocomputing, PSB 2017
AU - Wang, Sheng
AU - Qu, Meng
AU - Peng, Jian
N1 - Funding Information:
Jian Peng is supported by Sloan Research Fellowship. This research was partially supported by grant 1U54GM114838 awarded by NIGMS through funds provided by the trans-NIH Big Data to Knowledge initiative. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Funding Information:
Funding Jian Peng is supported by Sloan Research Fellowship. This research was partially supported by grant 1U54GM114838 awarded by NIGMS through funds provided by the trans-NIH Big Data to Knowledge initiative. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Publisher Copyright:
© 2017, World Scientific Publishing Co. Pte. Ltd. All rights reserved.
PY - 2017
Y1 - 2017
N2 - Automated annotation of protein function has become a critical task in the post-genomic era. Network-based approaches and homology-based approaches have been widely used and recently tested in large-scale community-wide assessment experiments. It is natural to integrate network data with homology information to further improve the predictive performance. However, integrating these two heterogeneous, high-dimensional and noisy datasets is non-trivial. In this work, we introduce a novel protein function prediction algorithm ProSNet. An integrated heterogeneous network is first built to include molecular networks of multiple species and link together homologous proteins across multiple species. Based on this integrated network, a dimensionality reduction algorithm is introduced to obtain compact low-dimensional vectors to encode proteins in the network. Finally, we develop machine learning classification algorithms that take the vectors as input and make predictions by transferring annotations both within each species and across different species. Extensive experiments on five major species demonstrate that our integration of homology with molecular networks substantially improves the predictive performance over existing approaches.
AB - Automated annotation of protein function has become a critical task in the post-genomic era. Network-based approaches and homology-based approaches have been widely used and recently tested in large-scale community-wide assessment experiments. It is natural to integrate network data with homology information to further improve the predictive performance. However, integrating these two heterogeneous, high-dimensional and noisy datasets is non-trivial. In this work, we introduce a novel protein function prediction algorithm ProSNet. An integrated heterogeneous network is first built to include molecular networks of multiple species and link together homologous proteins across multiple species. Based on this integrated network, a dimensionality reduction algorithm is introduced to obtain compact low-dimensional vectors to encode proteins in the network. Finally, we develop machine learning classification algorithms that take the vectors as input and make predictions by transferring annotations both within each species and across different species. Extensive experiments on five major species demonstrate that our integration of homology with molecular networks substantially improves the predictive performance over existing approaches.
KW - Data integration
KW - Dimensionality reduction
KW - Homology
KW - Molecular networks
KW - Protein function prediction
UR - http://www.scopus.com/inward/record.url?scp=85012172297&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85012172297&partnerID=8YFLogxK
U2 - 10.1142/9789813207813_0004
DO - 10.1142/9789813207813_0004
M3 - Conference article
C2 - 27896959
AN - SCOPUS:85012172297
SN - 2335-6928
VL - 0
SP - 27
EP - 38
JO - Pacific Symposium on Biocomputing
JF - Pacific Symposium on Biocomputing
IS - 212679
Y2 - 4 January 2017 through 8 January 2017
ER -