TY - GEN
T1 - Inference of gene pathways using Gaussian mixture models
AU - Ko, Younhee
AU - Zhai, Chengxiang
AU - Rodriguez-Zas, Sandra Luisa
PY - 2007
Y1 - 2007
N2 - Identification of gene-gene interactions and complete characterization of gene pathways are critical in understanding the transcript processes underlying biological processes. Bayesian network is a powerful framework to infer gene pathways. We developed a novel Bayesian network, in which we use Gaussian mixture models to describe continuous gene expression data and learn gene pathways. Mixture parameters were estimated using an EM algorithm, while the optimal number of mixture component for each gene node and the optimal network topology best supported by the data were identified using the Bayesian Information criterion (BIC). We applied the proposed approach to a histone pathway in yeast and to a less explored circadian rhythm pathway in honeybee. The performance of the proposed approach was compared against alternative Bayesian network algorithms that either discretize the gene expression information or use single distribution instead of mixtures. Evaluation shows that our approach outperforms other approaches in terms of more accurate inference of the known network and can effectively predict gene pathways with different topology using continuous data. In addition, the estimated mixture model can facilitate an intuitive description of the gene node behavior, thus enhancing the interpretation of the inferred network.
AB - Identification of gene-gene interactions and complete characterization of gene pathways are critical in understanding the transcript processes underlying biological processes. Bayesian network is a powerful framework to infer gene pathways. We developed a novel Bayesian network, in which we use Gaussian mixture models to describe continuous gene expression data and learn gene pathways. Mixture parameters were estimated using an EM algorithm, while the optimal number of mixture component for each gene node and the optimal network topology best supported by the data were identified using the Bayesian Information criterion (BIC). We applied the proposed approach to a histone pathway in yeast and to a less explored circadian rhythm pathway in honeybee. The performance of the proposed approach was compared against alternative Bayesian network algorithms that either discretize the gene expression information or use single distribution instead of mixtures. Evaluation shows that our approach outperforms other approaches in terms of more accurate inference of the known network and can effectively predict gene pathways with different topology using continuous data. In addition, the estimated mixture model can facilitate an intuitive description of the gene node behavior, thus enhancing the interpretation of the inferred network.
KW - Bayesian information criterion
KW - Bayesian networks
KW - Microarray
KW - Mixture model
UR - http://www.scopus.com/inward/record.url?scp=39449124154&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=39449124154&partnerID=8YFLogxK
U2 - 10.1109/BIBM.2007.59
DO - 10.1109/BIBM.2007.59
M3 - Conference contribution
AN - SCOPUS:39449124154
SN - 0769530311
SN - 9780769530314
T3 - Proceedings - 2007 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2007
SP - 362
EP - 367
BT - Proceedings - 2007 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2007
T2 - 2007 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2007
Y2 - 2 November 2007 through 4 November 2007
ER -