TY - JOUR
T1 - Exogene
T2 - A performant workflow for detecting viral integrations from paired-end next-generation sequencing data
AU - Stephens, Zachary
AU - O'Brien, Daniel
AU - Dehankar, Mrunal
AU - Roberts, Lewis R.
AU - Iyer, Ravishankar K.
AU - Kocher, Jean Pierre
N1 - Publisher Copyright:
© 2021 Stephens et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
PY - 2021/9
Y1 - 2021/9
N2 - The integration of viruses into the human genome is known to be associated with tumorigenesis in many cancers, but the accurate detection of integration breakpoints from short read sequencing data is made difficult by human-viral homologies, viral genome heterogeneity, coverage limitations, and other factors. To address this, we present Exogene, a sensitive and efficient workflow for detecting viral integrations from paired-end next generation sequencing data. Exogene's read filtering and breakpoint detection strategies yield integration coordinates that are highly concordant with long read validation. We demonstrate this concordance across 6 TCGA Hepatocellular carcinoma (HCC) tumor samples, identifying integrations of hepatitis B virus that are also supported by long reads. Additionally, we applied Exogene to targeted capture data from 426 previously studied HCC samples, achieving 98.9% concordance with existing methods and identifying 238 high-confidence integrations that were not previously reported. Exogene is applicable to multiple types of paired-end sequence data, including genome, exome, RNA-Seq and targeted capture.
AB - The integration of viruses into the human genome is known to be associated with tumorigenesis in many cancers, but the accurate detection of integration breakpoints from short read sequencing data is made difficult by human-viral homologies, viral genome heterogeneity, coverage limitations, and other factors. To address this, we present Exogene, a sensitive and efficient workflow for detecting viral integrations from paired-end next generation sequencing data. Exogene's read filtering and breakpoint detection strategies yield integration coordinates that are highly concordant with long read validation. We demonstrate this concordance across 6 TCGA Hepatocellular carcinoma (HCC) tumor samples, identifying integrations of hepatitis B virus that are also supported by long reads. Additionally, we applied Exogene to targeted capture data from 426 previously studied HCC samples, achieving 98.9% concordance with existing methods and identifying 238 high-confidence integrations that were not previously reported. Exogene is applicable to multiple types of paired-end sequence data, including genome, exome, RNA-Seq and targeted capture.
UR - http://www.scopus.com/inward/record.url?scp=85115773132&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85115773132&partnerID=8YFLogxK
U2 - 10.1371/journal.pone.0250915
DO - 10.1371/journal.pone.0250915
M3 - Article
C2 - 34550971
AN - SCOPUS:85115773132
SN - 1932-6203
VL - 16
JO - PloS one
JF - PloS one
IS - 9 September
M1 - e0250915
ER -