PATtyFams: Protein families for the microbial genomes in the PATRIC database

James J. Davis, Svetlana Gerdes, Gary J. Olsen, Robert Olson, Gordon D. Pusch, Maulik Shukla, Veronika Vonstein, Alice R. Wattam, Hyunseung Yoo

Research output: Contribution to journalArticle

Abstract

The ability to build accurate protein families is a fundamental operation in bioinformatics that influences comparative analyses, genome annotation, and metabolic modeling. For several years we have been maintaining protein families for all microbial genomes in the PATRIC database (Pathosystems Resource Integration Center, patricbrc.org) in order to drive many of the comparative analysis tools that are available through the PATRIC website. However, due to the burgeoning number of genomes, traditional approaches for generating protein families are becoming prohibitive. In this report, we describe a new approach for generating protein families, which we call PATtyFams. This method uses the k-mer-based function assignments available through RAST (Rapid Annotation using Subsystem Technology) to rapidly guide family formation, and then differentiates the function-based groups into families using a Markov Cluster algorithm (MCL). This new approach for generating protein families is rapid, scalable and has properties that are consistent with alignment-based methods.

Original languageEnglish (US)
Article number118
JournalFrontiers in Microbiology
Volume7
Issue numberFEB
DOIs
StatePublished - Feb 8 2016

Fingerprint

Microbial Genome
Databases
Proteins
Genome
Computational Biology
Technology

Keywords

  • Comparative genomics
  • FIGfams
  • Genome annotation
  • Metabolic modeling
  • RAST

ASJC Scopus subject areas

  • Microbiology
  • Microbiology (medical)

Cite this

PATtyFams : Protein families for the microbial genomes in the PATRIC database. / Davis, James J.; Gerdes, Svetlana; Olsen, Gary J.; Olson, Robert; Pusch, Gordon D.; Shukla, Maulik; Vonstein, Veronika; Wattam, Alice R.; Yoo, Hyunseung.

In: Frontiers in Microbiology, Vol. 7, No. FEB, 118, 08.02.2016.

Research output: Contribution to journalArticle

Davis, JJ, Gerdes, S, Olsen, GJ, Olson, R, Pusch, GD, Shukla, M, Vonstein, V, Wattam, AR & Yoo, H 2016, 'PATtyFams: Protein families for the microbial genomes in the PATRIC database', Frontiers in Microbiology, vol. 7, no. FEB, 118. https://doi.org/10.3389/fmicb.2016.00118
Davis, James J. ; Gerdes, Svetlana ; Olsen, Gary J. ; Olson, Robert ; Pusch, Gordon D. ; Shukla, Maulik ; Vonstein, Veronika ; Wattam, Alice R. ; Yoo, Hyunseung. / PATtyFams : Protein families for the microbial genomes in the PATRIC database. In: Frontiers in Microbiology. 2016 ; Vol. 7, No. FEB.
@article{30412149fda549138a3f2c8a184f3dbb,
title = "PATtyFams: Protein families for the microbial genomes in the PATRIC database",
abstract = "The ability to build accurate protein families is a fundamental operation in bioinformatics that influences comparative analyses, genome annotation, and metabolic modeling. For several years we have been maintaining protein families for all microbial genomes in the PATRIC database (Pathosystems Resource Integration Center, patricbrc.org) in order to drive many of the comparative analysis tools that are available through the PATRIC website. However, due to the burgeoning number of genomes, traditional approaches for generating protein families are becoming prohibitive. In this report, we describe a new approach for generating protein families, which we call PATtyFams. This method uses the k-mer-based function assignments available through RAST (Rapid Annotation using Subsystem Technology) to rapidly guide family formation, and then differentiates the function-based groups into families using a Markov Cluster algorithm (MCL). This new approach for generating protein families is rapid, scalable and has properties that are consistent with alignment-based methods.",
keywords = "Comparative genomics, FIGfams, Genome annotation, Metabolic modeling, RAST",
author = "Davis, {James J.} and Svetlana Gerdes and Olsen, {Gary J.} and Robert Olson and Pusch, {Gordon D.} and Maulik Shukla and Veronika Vonstein and Wattam, {Alice R.} and Hyunseung Yoo",
year = "2016",
month = "2",
day = "8",
doi = "10.3389/fmicb.2016.00118",
language = "English (US)",
volume = "7",
journal = "Frontiers in Microbiology",
issn = "1664-302X",
publisher = "Frontiers Media S. A.",
number = "FEB",

}

TY - JOUR

T1 - PATtyFams

T2 - Protein families for the microbial genomes in the PATRIC database

AU - Davis, James J.

AU - Gerdes, Svetlana

AU - Olsen, Gary J.

AU - Olson, Robert

AU - Pusch, Gordon D.

AU - Shukla, Maulik

AU - Vonstein, Veronika

AU - Wattam, Alice R.

AU - Yoo, Hyunseung

PY - 2016/2/8

Y1 - 2016/2/8

N2 - The ability to build accurate protein families is a fundamental operation in bioinformatics that influences comparative analyses, genome annotation, and metabolic modeling. For several years we have been maintaining protein families for all microbial genomes in the PATRIC database (Pathosystems Resource Integration Center, patricbrc.org) in order to drive many of the comparative analysis tools that are available through the PATRIC website. However, due to the burgeoning number of genomes, traditional approaches for generating protein families are becoming prohibitive. In this report, we describe a new approach for generating protein families, which we call PATtyFams. This method uses the k-mer-based function assignments available through RAST (Rapid Annotation using Subsystem Technology) to rapidly guide family formation, and then differentiates the function-based groups into families using a Markov Cluster algorithm (MCL). This new approach for generating protein families is rapid, scalable and has properties that are consistent with alignment-based methods.

AB - The ability to build accurate protein families is a fundamental operation in bioinformatics that influences comparative analyses, genome annotation, and metabolic modeling. For several years we have been maintaining protein families for all microbial genomes in the PATRIC database (Pathosystems Resource Integration Center, patricbrc.org) in order to drive many of the comparative analysis tools that are available through the PATRIC website. However, due to the burgeoning number of genomes, traditional approaches for generating protein families are becoming prohibitive. In this report, we describe a new approach for generating protein families, which we call PATtyFams. This method uses the k-mer-based function assignments available through RAST (Rapid Annotation using Subsystem Technology) to rapidly guide family formation, and then differentiates the function-based groups into families using a Markov Cluster algorithm (MCL). This new approach for generating protein families is rapid, scalable and has properties that are consistent with alignment-based methods.

KW - Comparative genomics

KW - FIGfams

KW - Genome annotation

KW - Metabolic modeling

KW - RAST

UR - http://www.scopus.com/inward/record.url?scp=84962097084&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84962097084&partnerID=8YFLogxK

U2 - 10.3389/fmicb.2016.00118

DO - 10.3389/fmicb.2016.00118

M3 - Article

AN - SCOPUS:84962097084

VL - 7

JO - Frontiers in Microbiology

JF - Frontiers in Microbiology

SN - 1664-302X

IS - FEB

M1 - 118

ER -