TY - GEN
T1 - Boosting protein threading accuracy
AU - Peng, Jian
AU - Xu, Jinbo
PY - 2009
Y1 - 2009
N2 - Protein threading is one of the most successful protein structure prediction methods. Most protein threading methods use a scoring function linearly combining sequence and structure features to measure the quality of a sequencetemplate alignment so that a dynamic programming algorithm can be used to optimize the scoring function. However, a linear scoring function cannot fully exploit interdependency among features and thus, limits alignment accuracy. This paper presents a nonlinear scoring function for protein threading, which not only can model interactions among different protein features, but also can be efficiently optimized using a dynamic programming algorithm. We achieve this by modeling the threading problem using a probabilistic graphical model Conditional Random Fields (CRF) and training the model using the gradient tree boosting algorithm. The resultant model is a nonlinear scoring function consisting of a collection of regression trees. Each regression tree models a type of nonlinear relationship among sequence and structure features. Experimental results indicate that this new threading model can effectively leverage weak biological signals and improve both alignment accuracy and fold recognition rate greatly.
AB - Protein threading is one of the most successful protein structure prediction methods. Most protein threading methods use a scoring function linearly combining sequence and structure features to measure the quality of a sequencetemplate alignment so that a dynamic programming algorithm can be used to optimize the scoring function. However, a linear scoring function cannot fully exploit interdependency among features and thus, limits alignment accuracy. This paper presents a nonlinear scoring function for protein threading, which not only can model interactions among different protein features, but also can be efficiently optimized using a dynamic programming algorithm. We achieve this by modeling the threading problem using a probabilistic graphical model Conditional Random Fields (CRF) and training the model using the gradient tree boosting algorithm. The resultant model is a nonlinear scoring function consisting of a collection of regression trees. Each regression tree models a type of nonlinear relationship among sequence and structure features. Experimental results indicate that this new threading model can effectively leverage weak biological signals and improve both alignment accuracy and fold recognition rate greatly.
KW - Conditional random fields
KW - Gradient tree boosting
KW - Nonlinear scoring function
KW - Protein threading
KW - Regression tree
UR - http://www.scopus.com/inward/record.url?scp=67650439454&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=67650439454&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-02008-7_3
DO - 10.1007/978-3-642-02008-7_3
M3 - Conference contribution
C2 - 22506254
AN - SCOPUS:67650439454
SN - 9783642020070
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 31
EP - 45
BT - Research in Computational Molecular Biology - 13th Annual International Conference, RECOMB 2009, Proceedings
T2 - 13th Annual International Conference on Research in Computational Molecular Biology, RECOMB 2009
Y2 - 18 May 2009 through 21 May 2009
ER -