This archive contains all the alignments and trees used in the HIPPI paper [1]. The pfam.tar archive contains the PFAM families
used to build the HMMs and BLAST databases. The file structure is:


where X is a Pfam family, Y is the cross-fold set (0, 1, 2, or 3). Inside the folder
are two files, initial.fasta which is the Pfam reference alignment with 1/4 of the
seed alignment removed and initial.fasttree, the FastTree-2 ML tree estimated on
the initial.fasta.

The query.tar archive contains the query sequences for each cross-fold set.

The associated query sequences for a cross-fold Y is labeled as query.Y.Z.fas,
where Z is the fragment length (1, 0.5, or 0.25). The query files are found
in the splits directory.

[1] Nguyen, Nam-Phuong D, Mike Nute, Siavash Mirarab, and Tandy Warnow. (2016) HIPPI: Highly Accurate Protein Family Classification with Ensembles of HMMs. To appear in BMC Genomics.
Date made available2016
PublisherUniversity of Illinois at Urbana-Champaign

Cite this

Nguyen, N. (Creator), Nute, M. (Creator), Mirarab, S. (Creator), Warnow, T. (Creator). (2016): HIPPI Dataset, University of Illinois at Urbana-Champaign.