TY - JOUR
T1 - A new long-read mitochondrial-genome protocol (PacBio HiFi) for haemosporidian parasites
T2 - a tool for population and biodiversity studies
AU - Pacheco, M. Andreína
AU - Cepeda, Axl S.
AU - Miller, Erica A.
AU - Beckerman, Scott
AU - Oswald, Mitchell
AU - London, Evan
AU - Mateus-Pinilla, Nohra E.
AU - Escalante, Ananias A.
N1 - Publisher Copyright:
© The Author(s) 2024.
PY - 2024/12
Y1 - 2024/12
N2 - Background: Studies on haemosporidian diversity, including origin of human malaria parasites, malaria's zoonotic dynamic, and regional biodiversity patterns, have used target gene approaches. However, current methods have a trade-off between scalability and data quality. Here, a long-read Next-Generation Sequencing protocol using PacBio HiFi is presented. The data processing is supported by a pipeline that uses machine-learning for analysing the reads. Methods: A set of primers was designed to target approximately 6 kb, almost the entire length of the haemosporidian mitochondrial genome. Amplicons from different samples were multiplexed in an SMRTbell® library preparation. A pipeline (HmtG-PacBio Pipeline) to process the reads is also provided; it integrates multiple sequence alignments, a machine-learning algorithm that uses modified variational autoencoders, and a clustering method to identify the mitochondrial haplotypes/species in a sample. Although 192 specimens could be studied simultaneously, a pilot experiment with 15 specimens is presented, including in silico experiments where multiple data combinations were tested. Results: The primers amplified various haemosporidian parasite genomes and yielded high-quality mt genome sequences. This new protocol allowed the detection and characterization of mixed infections and co-infections in the samples. The machine-learning approach converged into reproducible haplotypes with a low error rate, averaging 0.2% per read (minimum of 0.03% and maximum of 0.46%). The minimum recommended coverage per haplotype is 30X based on the detected error rates. The pipeline facilitates inspecting the data, including a local blast against a file of provided mitochondrial sequences that the researcher can customize. Conclusions: This is not a diagnostic approach but a high-throughput method to study haemosporidian sequence assemblages and perform genotyping by targeting the mitochondrial genome. Accordingly, the methodology allowed for examining specimens with multiple infections and co-infections of different haemosporidian parasites. The pipeline enables data quality assessment and comparison of the haplotypes obtained to those from previous studies. Although a single locus approach, whole mitochondrial data provide high-quality information to characterize species pools of haemosporidian parasites.
AB - Background: Studies on haemosporidian diversity, including origin of human malaria parasites, malaria's zoonotic dynamic, and regional biodiversity patterns, have used target gene approaches. However, current methods have a trade-off between scalability and data quality. Here, a long-read Next-Generation Sequencing protocol using PacBio HiFi is presented. The data processing is supported by a pipeline that uses machine-learning for analysing the reads. Methods: A set of primers was designed to target approximately 6 kb, almost the entire length of the haemosporidian mitochondrial genome. Amplicons from different samples were multiplexed in an SMRTbell® library preparation. A pipeline (HmtG-PacBio Pipeline) to process the reads is also provided; it integrates multiple sequence alignments, a machine-learning algorithm that uses modified variational autoencoders, and a clustering method to identify the mitochondrial haplotypes/species in a sample. Although 192 specimens could be studied simultaneously, a pilot experiment with 15 specimens is presented, including in silico experiments where multiple data combinations were tested. Results: The primers amplified various haemosporidian parasite genomes and yielded high-quality mt genome sequences. This new protocol allowed the detection and characterization of mixed infections and co-infections in the samples. The machine-learning approach converged into reproducible haplotypes with a low error rate, averaging 0.2% per read (minimum of 0.03% and maximum of 0.46%). The minimum recommended coverage per haplotype is 30X based on the detected error rates. The pipeline facilitates inspecting the data, including a local blast against a file of provided mitochondrial sequences that the researcher can customize. Conclusions: This is not a diagnostic approach but a high-throughput method to study haemosporidian sequence assemblages and perform genotyping by targeting the mitochondrial genome. Accordingly, the methodology allowed for examining specimens with multiple infections and co-infections of different haemosporidian parasites. The pipeline enables data quality assessment and comparison of the haplotypes obtained to those from previous studies. Although a single locus approach, whole mitochondrial data provide high-quality information to characterize species pools of haemosporidian parasites.
KW - Co-infections
KW - Haemoproteus
KW - Leucocytozoon
KW - Machine learning
KW - Mitochondrial genome
KW - Mixed infection
KW - Plasmodium
UR - http://www.scopus.com/inward/record.url?scp=85192060436&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85192060436&partnerID=8YFLogxK
U2 - 10.1186/s12936-024-04961-8
DO - 10.1186/s12936-024-04961-8
M3 - Article
C2 - 38704592
AN - SCOPUS:85192060436
SN - 1475-2875
VL - 23
JO - Malaria Journal
JF - Malaria Journal
IS - 1
M1 - 134
ER -