TY - JOUR
T1 - Aroadmap for natural product discovery based on large-scale genomics and metabolomics
AU - Doroghazi, James R.
AU - Albright, Jessica C.
AU - Goering, Anthony W.
AU - Ju, Kou San
AU - Haines, Robert R.
AU - Tchalukov, Konstantin A.
AU - Labeda, David P.
AU - Kelleher, Neil L.
AU - Metcalf, William W.
N1 - Funding Information:
J.R.D. was funded through an Institute for Genomic Biology fellowship. This work was supported in part by US National Institutes of Health grants GM PO1 GM077596 and GM 067725 (N.L.K.) and an Institute for Genomic Biology Proof of Concept grant. D.P.L. and the Agricultural Research Service (ARS) Culture Collection Current Research Information System project is funded through ARS National Program 301.
Publisher Copyright:
© 2014 Nature America, Inc. All rights reserved.
PY - 2014/11/1
Y1 - 2014/11/1
N2 - Actinobacteria encode a wealth of natural product biosynthetic gene clusters, whose systematic study is complicated by numerous repetitive motifs. By combining several metrics, we developed a method for the global classification of these gene clusters into families (GCFs) and analyzed the biosynthetic capacity of Actinobacteria in 830 genome sequences, including 344 obtained for this project. The GCF network, comprising 11,422 gene clusters grouped into 4,122 GCFs, was validated in hundreds of strains by correlating confident mass spectrometric detection of known small molecules with the presence or absence of their established biosynthetic gene clusters. The method also linked previously unassigned GCFs to known natural products, an approach that will enable de novo, bioassay-free discovery of new natural products using large data sets. Extrapolation from the 830-genome data set reveals that Actinobacteria encode hundreds of thousands of future drug leads, and the strong correlation between phylogeny and GCFs frames a roadmap to efficiently access them.
AB - Actinobacteria encode a wealth of natural product biosynthetic gene clusters, whose systematic study is complicated by numerous repetitive motifs. By combining several metrics, we developed a method for the global classification of these gene clusters into families (GCFs) and analyzed the biosynthetic capacity of Actinobacteria in 830 genome sequences, including 344 obtained for this project. The GCF network, comprising 11,422 gene clusters grouped into 4,122 GCFs, was validated in hundreds of strains by correlating confident mass spectrometric detection of known small molecules with the presence or absence of their established biosynthetic gene clusters. The method also linked previously unassigned GCFs to known natural products, an approach that will enable de novo, bioassay-free discovery of new natural products using large data sets. Extrapolation from the 830-genome data set reveals that Actinobacteria encode hundreds of thousands of future drug leads, and the strong correlation between phylogeny and GCFs frames a roadmap to efficiently access them.
UR - http://www.scopus.com/inward/record.url?scp=84922466451&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84922466451&partnerID=8YFLogxK
U2 - 10.1038/nCHeMBIO.1659
DO - 10.1038/nCHeMBIO.1659
M3 - Article
C2 - 25262415
AN - SCOPUS:84922466451
SN - 1552-4450
VL - 10
SP - 963
EP - 968
JO - Nature Chemical Biology
JF - Nature Chemical Biology
IS - 11
ER -