TY - JOUR
T1 - The bench scientist's guide to statistical analysis of RNA-Seq data
AU - Yendrek, Craig R.
AU - Ainsworth, Elizabeth A.
AU - Thimmapuram, Jyothi
N1 - Funding Information:
The authors would like to thank Matt Hudson for critical review of the manuscript and Alvaro Hernandez and the High-Throughput Sequencing and Genotyping Unit in the W.M. Keck Center for Comparative and Functional Genomics at the University of Illinois at Urbana-Champaign for carrying out the library preparation and RNA sequencing. This work was funded by the National Soybean Research Laboratory's Soybean Disease Biotechnology Center.
PY - 2012
Y1 - 2012
N2 - Background: RNA sequencing (RNA-Seq) is emerging as a highly accurate method to quantify transcript abundance. However, analyses of the large data sets obtained by sequencing the entire transcriptome of organisms have generally been performed by bioinformatics specialists. Here we provide a step-by-step guide and outline a strategy using currently available statistical tools that results in a conservative list of differentially expressed genes. We also discuss potential sources of error in RNA-Seq analysis that could alter interpretation of global changes in gene expression. Findings. When comparing statistical tools, the negative binomial distribution-based methods, edgeR and DESeq, respectively identified 11,995 and 11,317 differentially expressed genes from an RNA-seq dataset generated from soybean leaf tissue grown in elevated O3. However, the number of genes in common between these two methods was only 10,535, resulting in 2,242 genes determined to be differentially expressed by only one method. Upon analysis of the non-significant genes, several limitations of these analytic tools were revealed, including evidence for overly stringent parameters for determining statistical significance of differentially expressed genes as well as increased type II error for high abundance transcripts. Conclusions: Because of the high variability between methods for determining differential expression of RNA-Seq data, we suggest using several bioinformatics tools, as outlined here, to ensure that a conservative list of differentially expressed genes is obtained. We also conclude that despite these analytical limitations, RNA-Seq provides highly accurate transcript abundance quantification that is comparable to qRT-PCR.
AB - Background: RNA sequencing (RNA-Seq) is emerging as a highly accurate method to quantify transcript abundance. However, analyses of the large data sets obtained by sequencing the entire transcriptome of organisms have generally been performed by bioinformatics specialists. Here we provide a step-by-step guide and outline a strategy using currently available statistical tools that results in a conservative list of differentially expressed genes. We also discuss potential sources of error in RNA-Seq analysis that could alter interpretation of global changes in gene expression. Findings. When comparing statistical tools, the negative binomial distribution-based methods, edgeR and DESeq, respectively identified 11,995 and 11,317 differentially expressed genes from an RNA-seq dataset generated from soybean leaf tissue grown in elevated O3. However, the number of genes in common between these two methods was only 10,535, resulting in 2,242 genes determined to be differentially expressed by only one method. Upon analysis of the non-significant genes, several limitations of these analytic tools were revealed, including evidence for overly stringent parameters for determining statistical significance of differentially expressed genes as well as increased type II error for high abundance transcripts. Conclusions: Because of the high variability between methods for determining differential expression of RNA-Seq data, we suggest using several bioinformatics tools, as outlined here, to ensure that a conservative list of differentially expressed genes is obtained. We also conclude that despite these analytical limitations, RNA-Seq provides highly accurate transcript abundance quantification that is comparable to qRT-PCR.
KW - Differential Expression
KW - RNA-Seq
KW - Statistical analysis
UR - http://www.scopus.com/inward/record.url?scp=84866145935&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84866145935&partnerID=8YFLogxK
U2 - 10.1186/1756-0500-5-506
DO - 10.1186/1756-0500-5-506
M3 - Article
C2 - 22980220
AN - SCOPUS:84866145935
SN - 1756-0500
VL - 5
JO - BMC Research Notes
JF - BMC Research Notes
M1 - 506
ER -