TY - JOUR
T1 - Klumpy
T2 - A tool to evaluate the integrity of long-read genome assemblies and illusive sequence motifs
AU - Madrigal, Giovanni
AU - Minhas, Bushra Fazal
AU - Catchen, Julian
N1 - Publisher Copyright:
© 2024 The Author(s). Molecular Ecology Resources published by John Wiley & Sons Ltd.
PY - 2025/1
Y1 - 2025/1
N2 - The improvement and decreasing costs of third-generation sequencing technologies has widened the scope of biological questions researchers can address with de novo genome assemblies. With the increasing number of reference genomes, validating their integrity with minimal overhead is vital for establishing confident results in their applications. Here, we present Klumpy, a tool for detecting and visualizing both misassembled regions in a genome assembly and genetic elements (e.g. genes) of interest in a set of sequences. By leveraging the initial raw reads in combination with their respective genome assembly, we illustrate Klumpy's utility by investigating antifreeze glycoprotein (afgp) loci across two icefishes, by searching for a reported absent gene in the northern snakehead fish, and by scanning the reference genomes of a mudskipper and bumblebee for misassembled regions. In the two former cases, we were able to provide support for the noncanonical placement of an afgp locus in the icefishes and locate the missing snakehead gene. Furthermore, our genome scans were able identify an unmappable locus in the mudskipper reference genome and identify a putative repetitive element shared among several species of bees.
AB - The improvement and decreasing costs of third-generation sequencing technologies has widened the scope of biological questions researchers can address with de novo genome assemblies. With the increasing number of reference genomes, validating their integrity with minimal overhead is vital for establishing confident results in their applications. Here, we present Klumpy, a tool for detecting and visualizing both misassembled regions in a genome assembly and genetic elements (e.g. genes) of interest in a set of sequences. By leveraging the initial raw reads in combination with their respective genome assembly, we illustrate Klumpy's utility by investigating antifreeze glycoprotein (afgp) loci across two icefishes, by searching for a reported absent gene in the northern snakehead fish, and by scanning the reference genomes of a mudskipper and bumblebee for misassembled regions. In the two former cases, we were able to provide support for the noncanonical placement of an afgp locus in the icefishes and locate the missing snakehead gene. Furthermore, our genome scans were able identify an unmappable locus in the mudskipper reference genome and identify a putative repetitive element shared among several species of bees.
KW - bioinfomatics
KW - gene finding/annotation
KW - genomics
KW - long-read assembly
UR - http://www.scopus.com/inward/record.url?scp=85194572472&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85194572472&partnerID=8YFLogxK
U2 - 10.1111/1755-0998.13982
DO - 10.1111/1755-0998.13982
M3 - Article
C2 - 38800997
AN - SCOPUS:85194572472
SN - 1755-098X
VL - 25
JO - Molecular ecology resources
JF - Molecular ecology resources
IS - 1
M1 - e13982
ER -