Sampling multiple scoring functions can improve protein loop structure prediction accuracy

Yaohang Li, Ionel Rata, Eric Jakobsson

Research output: Contribution to journalArticlepeer-review


Accurately predicting loop structures is important for understanding functions of many proteins. In order to obtain loop models with high accuracy, efficiently sampling the loop conformation space to discover reasonable structures is a critical step. In loop conformation sampling, coarse-grain energy (scoring) functions coupling with reduced protein representations are often used to reduce the number of degrees of freedom as well as sampling computational time. However, due to implicitly considering many factors by reduced representations, the coarse-grain scoring functions may have potential insensitivity and inaccuracy, which can mislead the sampling process and consequently ignore important loop conformations. In this paper, we present a new computational sampling approach to obtain reasonable loop backbone models, so-called the Pareto optimal sampling (POS) method. The rationale of the POS method is to sample the function space of multiple, carefully selected scoring functions to discover an ensemble of diversified structures yielding Pareto optimality to all sampled conformations. The POS method can efficiently tolerate insensitivity and inaccuracy in individual scoring functions and thereby lead to significant accuracy improvement in loop structure prediction. We apply the POS method to a set of 4-12-residue loop targets using a function space composed of backbone-only Rosetta and distance-scale finite ideal-gas reference (DFIRE) and a triplet backbone dihedral potential developed in our lab. Our computational results show that in 501 out of 502 targets, the model sets generated by POS contain structure models are within subangstrom resolution. Moreover, the top-ranked models have a root mean square deviation (rmsd) less than 1 A in 96.8, 84.1, and 72.2% of the short (4-6 residues), medium (7-9 residues), and long (10-12 residues) targets, respectively, when the all-atom models are generated by local optimization from the backbone models and are ranked by our recently developed Pareto optimal consensus (POC) method. Similar sampling effectiveness can also be found in a set of 13-residue loop targets.

Original languageEnglish (US)
Pages (from-to)1656-1666
Number of pages11
JournalJournal of Chemical Information and Modeling
Issue number7
StatePublished - Jul 25 2011

ASJC Scopus subject areas

  • General Chemistry
  • General Chemical Engineering
  • Computer Science Applications
  • Library and Information Sciences


Dive into the research topics of 'Sampling multiple scoring functions can improve protein loop structure prediction accuracy'. Together they form a unique fingerprint.

Cite this