TY - JOUR
T1 - UnSplicer
T2 - Mapping spliced RNA-seq reads in compact genomes and filtering noisy splicing
AU - Burns, Paul D.
AU - Li, Yang
AU - Ma, Jian
AU - Borodovsky, Mark
N1 - Funding Information:
National Institutes of Health [HG000783 to M.B.], National Science Foundation [1054309 and 1262575 to J.M.]; National Institutes of Health [HG006464 to J.M.]. Funding for open access charge: National Institutes of Health [HG000783 to M.B.].
PY - 2014/2
Y1 - 2014/2
N2 - Accurate mapping of spliced RNA-Seq reads to genomic DNA has been known as a challenging problem. Despite significant efforts invested in developing efficient algorithms, with the human genome as a primary focus, the best solution is still not known. A recently introduced tool, TrueSight, has demonstrated better performance compared with earlier developed algorithms such as TopHat and MapSplice. To improve detection of splice junctions, TrueSight uses information on statistical patterns of nucleotide ordering in intronic and exonic DNA. This line of research led to yet another new algorithm, UnSplicer, designed for eukaryotic species with compact genomes where functional alternative splicing is likely to be dominated by splicing noise. Genome-specific parameters of the new algorithm are generated by GeneMark-ES, an ab initio gene prediction algorithm based on unsupervised training. UnSplicer shares several components with TrueSight; the difference lies in the training strategy and the classification algorithm. We tested UnSplicer on RNA-Seq data sets of Arabidopsis thaliana, Caenorhabditis elegans, Cryptococcus neoformans and Drosophila melanogaster. We have shown that splice junctions inferred by UnSplicer are in better agreement with knowledge accumulated on these well-studied genomes than predictions made by earlier developed tools.
AB - Accurate mapping of spliced RNA-Seq reads to genomic DNA has been known as a challenging problem. Despite significant efforts invested in developing efficient algorithms, with the human genome as a primary focus, the best solution is still not known. A recently introduced tool, TrueSight, has demonstrated better performance compared with earlier developed algorithms such as TopHat and MapSplice. To improve detection of splice junctions, TrueSight uses information on statistical patterns of nucleotide ordering in intronic and exonic DNA. This line of research led to yet another new algorithm, UnSplicer, designed for eukaryotic species with compact genomes where functional alternative splicing is likely to be dominated by splicing noise. Genome-specific parameters of the new algorithm are generated by GeneMark-ES, an ab initio gene prediction algorithm based on unsupervised training. UnSplicer shares several components with TrueSight; the difference lies in the training strategy and the classification algorithm. We tested UnSplicer on RNA-Seq data sets of Arabidopsis thaliana, Caenorhabditis elegans, Cryptococcus neoformans and Drosophila melanogaster. We have shown that splice junctions inferred by UnSplicer are in better agreement with knowledge accumulated on these well-studied genomes than predictions made by earlier developed tools.
UR - http://www.scopus.com/inward/record.url?scp=84895826309&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84895826309&partnerID=8YFLogxK
U2 - 10.1093/nar/gkt1141
DO - 10.1093/nar/gkt1141
M3 - Article
C2 - 24259430
AN - SCOPUS:84895826309
SN - 0305-1048
VL - 42
SP - e25
JO - Nucleic Acids Research
JF - Nucleic Acids Research
IS - 4
ER -