Additional datasets (RNASim10k) for EMMA: A New Method for Computing Multiple Sequence Alignments given a Constraint Subset Alignment

  • Chengze Shen (Creator)
  • Baqiao Liu (Creator)
  • Kelly P. Williams (Creator)
  • Tandy Warnow (Creator)

Dataset

Description

This upload contains one additional set of datasets (RNASim10k, ten replicates) used in Experiment 2 of the EMMA paper (appeared in WABI 2023): Shen, Chengze, Baqiao Liu, Kelly P. Williams, and Tandy Warnow. "EMMA: A New Method for Computing Multiple Sequence Alignments given a Constraint Subset Alignment".

The zipped file has the following structure:
10k
|__R0
|__unaln.fas
|__true.fas
|__true.tre
|__R1
...

# Alignment files:
1. `unaln.fas`: all unaligned sequences.
2. `true.fas`: the reference alignment of all sequences.
3. `true.tre`: the reference tree on all sequences.

For other datasets that uniquely appeared in EMMA, please refer to the related dataset (which is linked below): Shen, Chengze; Liu, Baqiao; Williams, Kelly P.; Warnow, Tandy (2022): Datasets for EMMA: A New Method for Computing Multiple Sequence Alignments given a Constraint Subset Alignment. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-2567453_V1
Date made availableSep 13 2023
PublisherUniversity of Illinois Urbana-Champaign

Keywords

  • sequence length heterogeneity
  • SALMA
  • alignment
  • eHMM
  • MAFFT

Cite this