A perspective on 16S rRNA operational taxonomic unit clustering using sequence similarity

Nam Phuong Nguyen, Tandy Warnow, Mihai Pop, Bryan White

Research output: Contribution to journalReview article

Abstract

The standard pipeline for 16S amplicon analysis starts by clustering sequences within a percent sequence similarity threshold (typically 97%) into 'Operational Taxonomic Units' (OTUs). From each OTU, a single sequence is selected as a representative. This representative sequence is annotated, and that annotation is applied to all remaining sequences within that OTU. This perspective paper will discuss the known shortcomings of this standard approach using results obtained from the Human Microbiome Project. In particular, we will show that the traditional approach of using pairwise sequence alignments to compute sequence similarity can result in poorly clustered OTUs. As OTUs are typically annotated based upon a single representative sequence, poorly clustered OTUs can have significant impact on downstream analyses. These results suggest that we need to move beyond simple clustering techniques for 16S analysis.

Original languageEnglish (US)
Article number16004
Journalnpj Biofilms and Microbiomes
Volume2
DOIs
StatePublished - Apr 20 2016

Fingerprint

Cluster Analysis
Sequence Alignment
Microbiota

ASJC Scopus subject areas

  • Biotechnology
  • Microbiology
  • Applied Microbiology and Biotechnology

Cite this

A perspective on 16S rRNA operational taxonomic unit clustering using sequence similarity. / Nguyen, Nam Phuong; Warnow, Tandy; Pop, Mihai; White, Bryan.

In: npj Biofilms and Microbiomes, Vol. 2, 16004, 20.04.2016.

Research output: Contribution to journalReview article

@article{575128b4eadc4de5a0d01bf06c9396b3,
title = "A perspective on 16S rRNA operational taxonomic unit clustering using sequence similarity",
abstract = "The standard pipeline for 16S amplicon analysis starts by clustering sequences within a percent sequence similarity threshold (typically 97{\%}) into 'Operational Taxonomic Units' (OTUs). From each OTU, a single sequence is selected as a representative. This representative sequence is annotated, and that annotation is applied to all remaining sequences within that OTU. This perspective paper will discuss the known shortcomings of this standard approach using results obtained from the Human Microbiome Project. In particular, we will show that the traditional approach of using pairwise sequence alignments to compute sequence similarity can result in poorly clustered OTUs. As OTUs are typically annotated based upon a single representative sequence, poorly clustered OTUs can have significant impact on downstream analyses. These results suggest that we need to move beyond simple clustering techniques for 16S analysis.",
author = "Nguyen, {Nam Phuong} and Tandy Warnow and Mihai Pop and Bryan White",
year = "2016",
month = "4",
day = "20",
doi = "10.1038/npjbiofilms.2016.4",
language = "English (US)",
volume = "2",
journal = "npj Biofilms and Microbiomes",
issn = "2055-5008",
publisher = "Nature Publishing Group",

}

TY - JOUR

T1 - A perspective on 16S rRNA operational taxonomic unit clustering using sequence similarity

AU - Nguyen, Nam Phuong

AU - Warnow, Tandy

AU - Pop, Mihai

AU - White, Bryan

PY - 2016/4/20

Y1 - 2016/4/20

N2 - The standard pipeline for 16S amplicon analysis starts by clustering sequences within a percent sequence similarity threshold (typically 97%) into 'Operational Taxonomic Units' (OTUs). From each OTU, a single sequence is selected as a representative. This representative sequence is annotated, and that annotation is applied to all remaining sequences within that OTU. This perspective paper will discuss the known shortcomings of this standard approach using results obtained from the Human Microbiome Project. In particular, we will show that the traditional approach of using pairwise sequence alignments to compute sequence similarity can result in poorly clustered OTUs. As OTUs are typically annotated based upon a single representative sequence, poorly clustered OTUs can have significant impact on downstream analyses. These results suggest that we need to move beyond simple clustering techniques for 16S analysis.

AB - The standard pipeline for 16S amplicon analysis starts by clustering sequences within a percent sequence similarity threshold (typically 97%) into 'Operational Taxonomic Units' (OTUs). From each OTU, a single sequence is selected as a representative. This representative sequence is annotated, and that annotation is applied to all remaining sequences within that OTU. This perspective paper will discuss the known shortcomings of this standard approach using results obtained from the Human Microbiome Project. In particular, we will show that the traditional approach of using pairwise sequence alignments to compute sequence similarity can result in poorly clustered OTUs. As OTUs are typically annotated based upon a single representative sequence, poorly clustered OTUs can have significant impact on downstream analyses. These results suggest that we need to move beyond simple clustering techniques for 16S analysis.

UR - http://www.scopus.com/inward/record.url?scp=85016922856&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85016922856&partnerID=8YFLogxK

U2 - 10.1038/npjbiofilms.2016.4

DO - 10.1038/npjbiofilms.2016.4

M3 - Review article

AN - SCOPUS:85016922856

VL - 2

JO - npj Biofilms and Microbiomes

JF - npj Biofilms and Microbiomes

SN - 2055-5008

M1 - 16004

ER -