Abstract

Many biological questions, including the estimation of deep evolutionary histories and the detection of remote homology between protein sequences, rely upon multiple sequence alignments and phylogenetic trees of large datasets. However, accurate large-scale multiple sequence alignment is very difficult, especially when the dataset contains fragmentary sequences. We present UPP, a multiple sequence alignment method that uses a new machine learning technique, the ensemble of hidden Markov models, which we propose here. UPP produces highly accurate alignments for both nucleotide and amino acid sequences, even on ultra-large datasets or datasets containing fragmentary sequences. UPP is available at https://github.com/smirarab/sepp.

Original languageEnglish (US)
Article number124
JournalGenome biology
Volume16
Issue number1
DOIs
StatePublished - Jun 16 2015

Fingerprint

sequence alignment
Phylogeny
Sequence Alignment
phylogeny
amino acid sequences
artificial intelligence
Amino Acid Sequence Homology
nucleotides
homology
Amino Acid Sequence
history
Nucleotides
amino acid
phylogenetics
methodology
protein
Datasets
alignment

ASJC Scopus subject areas

  • Ecology, Evolution, Behavior and Systematics
  • Genetics
  • Cell Biology

Cite this

Ultra-large alignments using phylogeny-aware profiles. / Nguyen, Nam Phuong D.; Mirarab, Siavash; Kumar, Keerthana; Warnow, Tandy.

In: Genome biology, Vol. 16, No. 1, 124, 16.06.2015.

Research output: Contribution to journalArticle

Nguyen, Nam Phuong D. ; Mirarab, Siavash ; Kumar, Keerthana ; Warnow, Tandy. / Ultra-large alignments using phylogeny-aware profiles. In: Genome biology. 2015 ; Vol. 16, No. 1.
@article{7169c64b2667451b8684852b439b8233,
title = "Ultra-large alignments using phylogeny-aware profiles",
abstract = "Many biological questions, including the estimation of deep evolutionary histories and the detection of remote homology between protein sequences, rely upon multiple sequence alignments and phylogenetic trees of large datasets. However, accurate large-scale multiple sequence alignment is very difficult, especially when the dataset contains fragmentary sequences. We present UPP, a multiple sequence alignment method that uses a new machine learning technique, the ensemble of hidden Markov models, which we propose here. UPP produces highly accurate alignments for both nucleotide and amino acid sequences, even on ultra-large datasets or datasets containing fragmentary sequences. UPP is available at https://github.com/smirarab/sepp.",
author = "Nguyen, {Nam Phuong D.} and Siavash Mirarab and Keerthana Kumar and Tandy Warnow",
year = "2015",
month = "6",
day = "16",
doi = "10.1186/s13059-015-0688-z",
language = "English (US)",
volume = "16",
journal = "Genome Biology",
issn = "1465-6906",
publisher = "BioMed Central",
number = "1",

}

TY - JOUR

T1 - Ultra-large alignments using phylogeny-aware profiles

AU - Nguyen, Nam Phuong D.

AU - Mirarab, Siavash

AU - Kumar, Keerthana

AU - Warnow, Tandy

PY - 2015/6/16

Y1 - 2015/6/16

N2 - Many biological questions, including the estimation of deep evolutionary histories and the detection of remote homology between protein sequences, rely upon multiple sequence alignments and phylogenetic trees of large datasets. However, accurate large-scale multiple sequence alignment is very difficult, especially when the dataset contains fragmentary sequences. We present UPP, a multiple sequence alignment method that uses a new machine learning technique, the ensemble of hidden Markov models, which we propose here. UPP produces highly accurate alignments for both nucleotide and amino acid sequences, even on ultra-large datasets or datasets containing fragmentary sequences. UPP is available at https://github.com/smirarab/sepp.

AB - Many biological questions, including the estimation of deep evolutionary histories and the detection of remote homology between protein sequences, rely upon multiple sequence alignments and phylogenetic trees of large datasets. However, accurate large-scale multiple sequence alignment is very difficult, especially when the dataset contains fragmentary sequences. We present UPP, a multiple sequence alignment method that uses a new machine learning technique, the ensemble of hidden Markov models, which we propose here. UPP produces highly accurate alignments for both nucleotide and amino acid sequences, even on ultra-large datasets or datasets containing fragmentary sequences. UPP is available at https://github.com/smirarab/sepp.

UR - http://www.scopus.com/inward/record.url?scp=84939168822&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84939168822&partnerID=8YFLogxK

U2 - 10.1186/s13059-015-0688-z

DO - 10.1186/s13059-015-0688-z

M3 - Article

C2 - 26076734

AN - SCOPUS:84939168822

VL - 16

JO - Genome Biology

JF - Genome Biology

SN - 1465-6906

IS - 1

M1 - 124

ER -