Abstract

We address the problem of how to estimate a phylogenetic network when given single-nucleotide polymorphisms (i.e., SNPs, or bi-allelic markers that have evolved under the infinite sites assumption). We focus on level-1 phylogenetic networks (i.e., networks where the cycles are node-disjoint), since more complex networks are unidentifiable. We provide a polynomial time quartet-based method that we prove correct for reconstructing the semi-directed level-1 phylogenetic network N, if we are given a set of SNPs that covers all the bipartitions of N, even if the ancestral state is not known, provided that the cycles are of length at least 5; we also prove that an algorithm developed by Dan Gusfield in the Journal of Computer and System Sciences in 2005 correctly recovers semi-directed level-1 phylogenetic networks in polynomial time in this case. We present a stochastic model for DNA evolution, and we prove that the two methods (our quartet-based method and Gusfield's method) are statistically consistent estimators of the semi-directed level-1 phylogenetic network. For the case of multi-state homoplasy-free characters, we prove that our quartet-based method correctly constructs semi-directed level-1 networks under the required conditions (all cycles of length at least five), while Gusfield's algorithm cannot be used in that case. These results assume that we have access to an oracle for indicating which sites in the DNA alignment are homoplasy-free, and we show that the methods are robust, under some conditions, to oracle errors.

Original languageEnglish (US)
Pages (from-to)3-27
Number of pages25
JournalJournal of computational biology : a journal of computational molecular cell biology
Volume32
Issue number1
DOIs
StatePublished - Jan 1 2025

Keywords

  • phylogenetic networks
  • quartet trees
  • semi-directed phylogenetic networks
  • single-nucleotide polymorphisms
  • statistical consistency

ASJC Scopus subject areas

  • Modeling and Simulation
  • Molecular Biology
  • Genetics
  • Computational Mathematics
  • Computational Theory and Mathematics

Fingerprint

Dive into the research topics of 'Advances in Estimating Level-1 Phylogenetic Networks from Unrooted SNPs'. Together they form a unique fingerprint.

Cite this