TY - JOUR
T1 - Ultra-large alignments using phylogeny-aware profiles
AU - Nguyen, Nam Phuong D.
AU - Mirarab, Siavash
AU - Kumar, Keerthana
AU - Warnow, Tandy
N1 - Funding Information:
The authors thank TACC at the University of Texas at Austin for providing high-performance computing resources that contributed to the research results reported within this paper. TW was supported by the US National Science Foundation through grants 0733029 and 1461364. SM was supported by an international predoctoral fellowship from the Howard Hughes Medical Institute. NN was supported by the University of Alberta through a grant to TW and by National Science Foundation grant 1461364. The authors thank Erich Jarvis, Tom Gilbert, Jim Leebens-Mack, Ruth Davidson, Michael Nute, and the anonymous reviewers for their helpful critiques of early versions of the manuscript. This paper was selected for oral presentation at RECOMB 2015 and an abstract has been published in the conference proceedings.
Publisher Copyright:
© 2015 Nguyen et al.
PY - 2015/6/16
Y1 - 2015/6/16
N2 - Many biological questions, including the estimation of deep evolutionary histories and the detection of remote homology between protein sequences, rely upon multiple sequence alignments and phylogenetic trees of large datasets. However, accurate large-scale multiple sequence alignment is very difficult, especially when the dataset contains fragmentary sequences. We present UPP, a multiple sequence alignment method that uses a new machine learning technique, the ensemble of hidden Markov models, which we propose here. UPP produces highly accurate alignments for both nucleotide and amino acid sequences, even on ultra-large datasets or datasets containing fragmentary sequences. UPP is available at https://github.com/smirarab/sepp.
AB - Many biological questions, including the estimation of deep evolutionary histories and the detection of remote homology between protein sequences, rely upon multiple sequence alignments and phylogenetic trees of large datasets. However, accurate large-scale multiple sequence alignment is very difficult, especially when the dataset contains fragmentary sequences. We present UPP, a multiple sequence alignment method that uses a new machine learning technique, the ensemble of hidden Markov models, which we propose here. UPP produces highly accurate alignments for both nucleotide and amino acid sequences, even on ultra-large datasets or datasets containing fragmentary sequences. UPP is available at https://github.com/smirarab/sepp.
UR - http://www.scopus.com/inward/record.url?scp=84939168822&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84939168822&partnerID=8YFLogxK
U2 - 10.1186/s13059-015-0688-z
DO - 10.1186/s13059-015-0688-z
M3 - Article
C2 - 26076734
AN - SCOPUS:84939168822
SN - 1474-7596
VL - 16
JO - Genome biology
JF - Genome biology
IS - 1
M1 - 124
ER -