TY - JOUR
T1 - Evolutionary profiles derived from the QR factorization of multiple structural alignments gives an economy of information
AU - O'Donoghue, Patrick
AU - Luthey-Schulten, Zaida
N1 - Funding Information:
We are grateful to Carl Woese for stimulating discussions on the aminoacyl-tRNA synthetases and other evolutionary matters, and many thanks are due to Michael Heath for discussions concerning the QR factorization. We thank John Eargle for coding the algorithms presented here into a new multiple structural alignment feature in VMD version 1.8.3, 35 and Anurag Sethi for providing the analysis detailed in Figure 10 . P.O'D. was supported on an NIH Institutional NRSA in Molecular Biophysics (5T32GM08276) with additional support from the NSF grant MCB04-46227.
PY - 2005/2/25
Y1 - 2005/2/25
N2 - We present a new algorithm, based on the multidimensional QR factorization, to remove redundancy from a multiple structural alignment by choosing representative protein structures that best preserve the phylogenetic tree topology of the homologous group. The classical QR factorization with pivoting, developed as a fast numerical solution to eigenvalue and linear least-squares problems of the form Ax=b, was designed to re-order the columns of A by increasing linear dependence. Removing the most linear dependent columns from A leads to the formation of a minimal basis set which well spans the phase space of the problem at hand. By recasting the problem of redundancy in multiple structural alignments into this framework, in which the matrix A now describes the multiple alignment, we adapted the QR factorization to produce a minimal basis set of protein structures which best spans the evolutionary (phase) space. The non-redundant and representative profiles obtained from this procedure, termed evolutionary profiles, are shown in initial results to outperform well-tested profiles in homology detection searches over a large sequence database. A measure of structural similarity between homologous proteins, Q H, is presented. By properly accounting for the effect and presence of gaps, a phylogenetic tree computed using this metric is shown to be congruent with the maximum-likelihood sequence-based phylogeny. The results indicate that evolutionary information is indeed recoverable from the comparative analysis of protein structure alone. Applications of the QR ordering and this structural similarity metric to analyze the evolution of structure among key, universally distributed proteins involved in translation, and to the selection of representatives from an ensemble of NMR structures are also discussed.
AB - We present a new algorithm, based on the multidimensional QR factorization, to remove redundancy from a multiple structural alignment by choosing representative protein structures that best preserve the phylogenetic tree topology of the homologous group. The classical QR factorization with pivoting, developed as a fast numerical solution to eigenvalue and linear least-squares problems of the form Ax=b, was designed to re-order the columns of A by increasing linear dependence. Removing the most linear dependent columns from A leads to the formation of a minimal basis set which well spans the phase space of the problem at hand. By recasting the problem of redundancy in multiple structural alignments into this framework, in which the matrix A now describes the multiple alignment, we adapted the QR factorization to produce a minimal basis set of protein structures which best spans the evolutionary (phase) space. The non-redundant and representative profiles obtained from this procedure, termed evolutionary profiles, are shown in initial results to outperform well-tested profiles in homology detection searches over a large sequence database. A measure of structural similarity between homologous proteins, Q H, is presented. By properly accounting for the effect and presence of gaps, a phylogenetic tree computed using this metric is shown to be congruent with the maximum-likelihood sequence-based phylogeny. The results indicate that evolutionary information is indeed recoverable from the comparative analysis of protein structure alone. Applications of the QR ordering and this structural similarity metric to analyze the evolution of structure among key, universally distributed proteins involved in translation, and to the selection of representatives from an ensemble of NMR structures are also discussed.
KW - Aminoacyl-tRNA synthetase
KW - Evolution
KW - Non-redundant set
KW - OB-fold
KW - Protein structure profiles
UR - http://www.scopus.com/inward/record.url?scp=13844266306&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=13844266306&partnerID=8YFLogxK
U2 - 10.1016/j.jmb.2004.11.053
DO - 10.1016/j.jmb.2004.11.053
M3 - Article
C2 - 15713469
AN - SCOPUS:13844266306
SN - 0022-2836
VL - 346
SP - 875
EP - 894
JO - Journal of Molecular Biology
JF - Journal of Molecular Biology
IS - 3
ER -