This archive contains all the alignments and trees used in the HIPPI paper [1]. The pfam.tar archive contains the PFAM families
used to build the HMMs and BLAST databases. The file structure is:
./X/Y/initial.fasttree
./X/Y/initial.fasta
where X is a Pfam family, Y is the cross-fold set (0, 1, 2, or 3). Inside the folder
are two files, initial.fasta which is the Pfam reference alignment with 1/4 of the
seed alignment removed and initial.fasttree, the FastTree-2 ML tree estimated on
the initial.fasta.
The query.tar archive contains the query sequences for each cross-fold set.
The associated query sequences for a cross-fold Y is labeled as query.Y.Z.fas,
where Z is the fragment length (1, 0.5, or 0.25). The query files are found
in the splits directory.
[1] Nguyen, Nam-Phuong D, Mike Nute, Siavash Mirarab, and Tandy Warnow. (2016) HIPPI: Highly Accurate Protein Family Classification with Ensembles of HMMs. To appear in BMC Genomics.
- ensembles of profile Hidden Markov models
- HIPPI dataset
- Pfam