TY - JOUR
T1 - V-REVCOMP
T2 - Automated high-throughput detection of reverse complementary 16S rRNA gene sequences in large environmental and taxonomic datasets
AU - Hartmann, Martin
AU - Howes, Charles G.
AU - Veldre, Vilmar
AU - Schneider, Salome
AU - Vaishampayan, Parag A.
AU - Yannarell, Anthony C.
AU - Quince, Christopher
AU - Johansson, Per
AU - Björkroth, K. Johanna
AU - Abarenkov, Kessy
AU - Hallam, Steven J.
AU - Mohn, William W.
AU - Nilsson, R. Henrik
PY - 2011/6
Y1 - 2011/6
N2 - Reverse complementary DNA sequences - sequences that are inadvertently given backwards with all purines and pyrimidines transposed - can affect sequence analysis detrimentally unless taken into account. We present an open-source, high-throughput software tool -v-revcomp - to detect and reorient reverse complementary entries of the small-subunit rRNA (16S) gene from sequencing datasets, particularly from environmental sources. The software supports sequence lengths ranging from full length down to the short reads that are characteristic of next-generation sequencing technologies. We evaluated the reliability of v-revcomp by screening all 406781 16S sequences deposited in release 102 of the curated SILVA database and demonstrated that the tool has a detection accuracy of virtually 100%. We subsequently used v-revcomp to analyse 1171646 16S sequences deposited in the International Nucleotide Sequence Databases and found that about 1% of these user-submitted sequences were reverse complementary. In addition, a nontrivial proportion of the entries were otherwise anomalous, including reverse complementary chimeras, sequences associated with wrong taxa, nonribosomal genes, sequences of poor quality or otherwise erroneous sequences without a reasonable match to any other entry in the database. Thus, v-revcomp is highly efficient in detecting and reorienting reverse complementary 16S sequences of almost any length and can be used to detect various sequence anomalies.
AB - Reverse complementary DNA sequences - sequences that are inadvertently given backwards with all purines and pyrimidines transposed - can affect sequence analysis detrimentally unless taken into account. We present an open-source, high-throughput software tool -v-revcomp - to detect and reorient reverse complementary entries of the small-subunit rRNA (16S) gene from sequencing datasets, particularly from environmental sources. The software supports sequence lengths ranging from full length down to the short reads that are characteristic of next-generation sequencing technologies. We evaluated the reliability of v-revcomp by screening all 406781 16S sequences deposited in release 102 of the curated SILVA database and demonstrated that the tool has a detection accuracy of virtually 100%. We subsequently used v-revcomp to analyse 1171646 16S sequences deposited in the International Nucleotide Sequence Databases and found that about 1% of these user-submitted sequences were reverse complementary. In addition, a nontrivial proportion of the entries were otherwise anomalous, including reverse complementary chimeras, sequences associated with wrong taxa, nonribosomal genes, sequences of poor quality or otherwise erroneous sequences without a reasonable match to any other entry in the database. Thus, v-revcomp is highly efficient in detecting and reorienting reverse complementary 16S sequences of almost any length and can be used to detect various sequence anomalies.
KW - 16S sequence
KW - HMMER
KW - Hidden Markov models
KW - Reverse complementary
KW - SSU rRNA gene
KW - Software
UR - http://www.scopus.com/inward/record.url?scp=79956127809&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79956127809&partnerID=8YFLogxK
U2 - 10.1111/j.1574-6968.2011.02274.x
DO - 10.1111/j.1574-6968.2011.02274.x
M3 - Article
C2 - 21453324
AN - SCOPUS:79956127809
SN - 0378-1097
VL - 319
SP - 140
EP - 145
JO - FEMS microbiology letters
JF - FEMS microbiology letters
IS - 2
ER -