TY - JOUR
T1 - PB-Motif—A Method for Identifying Gene/Pseudogene Rearrangements With Long Reads
T2 - An Application to CYP21A2 Genotyping
AU - Stephens, Zachary
AU - Milosevic, Dragana
AU - Kipp, Benjamin
AU - Grebe, Stefan
AU - Iyer, Ravishankar K.
AU - Kocher, Jean Pierre A.
N1 - Funding Information:
We would like to thank the Mayo Clinic Center For Individualized Medicine. Funding. This work was funded by the Mayo Clinic Center For Individualized Medicine. This study is the authors' independent work, and the funding agency only provides relevant financial support.
Publisher Copyright:
© Copyright © 2021 Stephens, Milosevic, Kipp, Grebe, Iyer and Kocher.
PY - 2021/7/28
Y1 - 2021/7/28
N2 - Long read sequencing technologies have the potential to accurately detect and phase variation in genomic regions that are difficult to fully characterize with conventional short read methods. These difficult to sequence regions include several clinically relevant genes with highly homologous pseudogenes, many of which are prone to gene conversions or other types of complex structural rearrangements. We present PB-Motif, a new method for identifying rearrangements between two highly homologous genomic regions using PacBio long reads. PB-Motif leverages clustering and filtering techniques to efficiently report rearrangements in the presence of sequencing errors and other systematic artifacts. Supporting reads for each high-confidence rearrangement can then be used for copy number estimation and phased variant calling. First, we demonstrate PB-Motif's accuracy with simulated sequence rearrangements of PMS2 and its pseudogene PMS2CL using simulated reads sweeping over a range of sequencing error rates. We then apply PB-Motif to 26 clinical samples, characterizing CYP21A2 and its pseudogene CYP21A1P as part of a diagnostic assay for congenital adrenal hyperplasia. We successfully identify damaging variation and patient carrier status concordant with clinical diagnosis obtained from multiplex ligation-dependent amplification (MLPA) and Sanger sequencing. The source code is available at: github.com/zstephens/pb-motif.
AB - Long read sequencing technologies have the potential to accurately detect and phase variation in genomic regions that are difficult to fully characterize with conventional short read methods. These difficult to sequence regions include several clinically relevant genes with highly homologous pseudogenes, many of which are prone to gene conversions or other types of complex structural rearrangements. We present PB-Motif, a new method for identifying rearrangements between two highly homologous genomic regions using PacBio long reads. PB-Motif leverages clustering and filtering techniques to efficiently report rearrangements in the presence of sequencing errors and other systematic artifacts. Supporting reads for each high-confidence rearrangement can then be used for copy number estimation and phased variant calling. First, we demonstrate PB-Motif's accuracy with simulated sequence rearrangements of PMS2 and its pseudogene PMS2CL using simulated reads sweeping over a range of sequencing error rates. We then apply PB-Motif to 26 clinical samples, characterizing CYP21A2 and its pseudogene CYP21A1P as part of a diagnostic assay for congenital adrenal hyperplasia. We successfully identify damaging variation and patient carrier status concordant with clinical diagnosis obtained from multiplex ligation-dependent amplification (MLPA) and Sanger sequencing. The source code is available at: github.com/zstephens/pb-motif.
KW - bioinformatics
KW - computational biology
KW - congenital adrenal hyperplasia
KW - CYP21A2
KW - long reads
KW - pseudogene
KW - structural variation
UR - http://www.scopus.com/inward/record.url?scp=85112683898&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85112683898&partnerID=8YFLogxK
U2 - 10.3389/fgene.2021.716586
DO - 10.3389/fgene.2021.716586
M3 - Article
C2 - 34394200
SN - 1664-8021
VL - 12
JO - Frontiers in Genetics
JF - Frontiers in Genetics
M1 - 716586
ER -