Dataset title: Datasets, scripts and main output files for "Phylogeny, Biogeography and Morphological Evolution of the Treehopper-Like Leafhoppers (Hemiptera: Cicadellidae) Megophthalminae and Ulopinae"

Dataset

Description

The following seven zip files are compressed folders containing the input datasets/trees, main output files and the scripts of the related analyses performed in this study.

I. ancestral_microhabitat_reconstruction.zip: contains four files, including two input files (microhabitats.csv, timetree.tre) and a script (simmap_microhabitat.R) for ancestral states reconstruction of microhabitat by make.simmap implemented in the R package phytools v1.5, as well as the main output file (ancestral_microhabitats.csv).

1. ancestral_microhabitats.csv: reconstructed ancestral microhabitats for each node.

2. microhabitats.csv: microhabitats of the studies species.

3. simmap_microhabitat.R: the R script of make.simmap for ancestral microhabitat reconstruction

4. timetree.tre: dated tree used for ancestral state reconstruction for microhabitat and morphological characters

II. ancestral_morphology_reconstruction.zip: contains six files, including an input file (morphology.csv) and a script (simmap_morphology.R) for ancestral states reconstruction of morphology by make.simmap implemented in the R package phytools v1.5, as well as four main output files(forewing_ancestral_state.csv, frontal_sutures_ancestral_state.csv, hind_wing_ancestral_state.csv, ocellus_ancestral_state.csv).

1. forewing_ancestral_state.csv: reconstructed ancestral states of the development of the forewing for each node.

2. frontal_sutures_ancestral_state.csv: reconstructed ancestral states of the development of frontal sutures for each node.

3. hind_wing_ancestral_state.csv: reconstructed ancestral states of the development of the hind wing for each node.

4. morphology.csv: the states of the development of ocellus, forewing, hing wing and frontal sutures for each studies species.

5. ocellus_ancestral_state.csv: reconstructed ancestral states of the development of the ocellus for each node.

6. simmap_morphology.R: the R script of make.simmap for ancestral state reconstruction of morphology

III. biogeographic_reconstruction.zip: contains four files, including three input files (dispersal_probablity.txt, distributions.csv, timetree_noOutgroup.tre) used for a stratified biogeographic analysis by BioGeoBEARS in RASP v4.2 and the main output file (DIVELIKE_result.txt).

1. dispersal_probablity.txt: relative dispersal probabilities among biogeographical regions at different geological epochs.

2. distributions.csv: current distributions of the studied species.

3. DIVELIKE_result.txt: BioGeoBEARS result of ancestral areas based on the DIVELIKE model.

4. timetree_noOutgroup.tre: the dated tree with the outgroup lineage (Eurymelinae) excluded.

IV. coalescent_analysis.zip: contains a folder and two files, including a folder (individual_gene_alignment) of input files used to construct gene trees, an input file (MLtree_BS70.tre) used for the multi-species coalescent analysis by ASTRAL v 4.10.5 and the main output file (coalescent_species_tree.tre).

1. coalescent_species_tree.tre: the species tree generated by the multi-species coalescent analysis with the quartet support, effective number of genes and the local posterior probability indicated.

2. individual_gene_alignment: a folder containing 427 FASTA files, each one represents the nucleotide alignment for a gene. Hyphens are used to represent gaps. These files were used to construct gene trees using IQ-TREE v1.6.12.

3. MLtree_BS70.tre: 165 gene trees with the average SH-aLRT and ultrafast bootstrap values of ≥ 70%. This file was used to estimate the species tree by ASTRAL v 4.10.5.

V. divergence_time_estimation.zip: contains five files, including two input files (treefile_rooted_noBranchLength.tre, treefile_rooted.tre) and two control files (baseml.ctl, mcmctree.ctl) used for divergence time estimation by BASEML and MCMCTREE in PAML v4.9, as well as the main output file (timetree_with95%HPD.tre).

1. baseml.ctl: the control file used for the estimation of substitution rates by BASEML in PAML v4.9.

2. mcmctree.ctl: the control file used for the estimation of divergence times by MCMCTREE in PAML v4.9.

3. timetree_with95%HPD.tre: dated tree with the 95% highest posterior density confidence intervals indicated.

4. treefile_rooted_noBranchLength.tre: the maximum likelihood tree based on the concatenated nucleotide dataset with calibrations for the crown and internal nodes. Branch length and support values were not indicated.

5. treefile_rooted.tre: the maximum likelihood tree based on the concatenated nucleotide dataset with a secondary calibration on the root age. Branch support values were not indicated.

VI. maximum_likelihood_analysis_aa.zip: contains three files, including two input files (concatenated_aa_partition.nex, concatenated_aa.phy) used for the maximum likelihood analysis by IQ-TREE v1.6.12 and the main output file (MLtree_aa.tre).

1. concatenated_aa_partition.nex: the partitioning schemes for the maximum likelihood analysis using concatenated_aa.phy. This file partitions the 52,024 amino acid positions into 427 character sets.

2. concatenated_aa.phy: a concatenated amino acid dataset with 52,024 amino acid positions. Hyphens are used to represent gaps. This dataset was used for the maximum likelihood analysis.

3. MLtree_aa.tre: the maximum likelihood tree based on the concatenated amino acid dataset, with SH-aLRT values and ultrafast bootstrap values indicated.

VII. maximum_likelihood_analysis_nt.zip: contains three files, including two input files (concatenated_nt_partition.nex, concatenated_nt.phy) used for the maximum likelihood analysis by IQ-TREE v1.6.12 and the main output file (MLtree_nt.tre).

1. concatenated_nt_partition.nex: the partitioning schemes for the maximum likelihood analysis using concatenated_nt.phy. This file partitions the 156,072 nucleotide positions into 427 character sets.

2. concatenated_nt.phy: a concatenated nucleotide dataset with 156,072 nucleotide positions. Hyphens are used to represent gaps. This dataset was used for the maximum likelihood analysis as well as divergence time estimation.

3. MLtree_nt.tre: the maximum likelihood tree based on the concatenated nucleotide dataset, with SH-aLRT values and ultrafast bootstrap values indicated.

VIII. Taxon_sampling.csv: contains the sample IDs (1st column) which were used in the alignments and the taxonomic information (2nd to 6th columns).
Date made availableSep 17 2024
PublisherUniversity of Illinois Urbana-Champaign

Keywords

  • Anchored Hybrid Enrichment, Biogeography, Cicadellidae, Phylogenomics, Treehoppers

Cite this