The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes

Ross Overbeek, Tadhg Begley, Ralph M. Butler, Jomuna V. Choudhuri, Han Yu Chuang, Matthew Cohoon, Valérie de Crécy-Lagard, Naryttza Diaz, Terry Disz, Robert Edwards, Michael Fonstein, Ed D. Frank, Svetlana Gerdes, Elizabeth M. Glass, Alexander Goesmann, Andrew Hanson, Dirk Iwata-Reuyl, Roy Jensen, Neema Jamshidi, Lutz KrauseMichael Kubal, Niels Larsen, Burkhard Linke, Alice C. McHardy, Folker Meyer, Heiko Neuweger, Gary Olsen, Robert Olson, Andrei Osterman, Vasiliy Portnoy, Gordon D. Pusch, Dmitry A. Rodionov, Christian Rül;ckert, Jason Steiner, Rick Stevens, Ines Thiele, Olga Vassieva, Yuzhen Ye, Olga Zagnitko, Veronika Vonstein

Research output: Contribution to journalArticle

Abstract

The release of the 1000th complete microbial genome will occur in the next two to three years. In anticipation of this milestone, the Fellowship for Interpretation of Genomes (FIG) launched the Project to Annotate 1000 Genomes. The project is built around the principle that the key to improved accuracy in high-throughput annotation technology is to have experts annotate single subsystems over the complete collection of genomes, rather than having an annotation expert attempt to annotate all of the genes in a single genome. Using the subsystems approach, all of the genes implementing the subsystem are analyzed by an expert in that subsystem. An annotation environment was created where populated subsystems are curated and projected to new genomes. A portable notion of a populated subsystem was defined, and tools developed for exchanging and curating these objects. Tools were also developed to resolve conflicts between populated subsystems. The SEED is the first annotation environment that supports this model of annotation. Here, we describe the subsystem approach, and offer the first release of our growing library of populated subsystems. The initial release of data includes 180 177 distinct proteins with 2133 distinct functional roles. This data comes from 173 subsystems and 383 different organisms.

Original languageEnglish (US)
Pages (from-to)5691-5702
Number of pages12
JournalNucleic acids research
Volume33
Issue number17
DOIs
StatePublished - Nov 11 2005

Fingerprint

Genome
Microbial Genome
Genes
Libraries
Technology
Proteins

ASJC Scopus subject areas

  • Genetics

Cite this

Overbeek, R., Begley, T., Butler, R. M., Choudhuri, J. V., Chuang, H. Y., Cohoon, M., ... Vonstein, V. (2005). The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes. Nucleic acids research, 33(17), 5691-5702. https://doi.org/10.1093/nar/gki866

The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes. / Overbeek, Ross; Begley, Tadhg; Butler, Ralph M.; Choudhuri, Jomuna V.; Chuang, Han Yu; Cohoon, Matthew; de Crécy-Lagard, Valérie; Diaz, Naryttza; Disz, Terry; Edwards, Robert; Fonstein, Michael; Frank, Ed D.; Gerdes, Svetlana; Glass, Elizabeth M.; Goesmann, Alexander; Hanson, Andrew; Iwata-Reuyl, Dirk; Jensen, Roy; Jamshidi, Neema; Krause, Lutz; Kubal, Michael; Larsen, Niels; Linke, Burkhard; McHardy, Alice C.; Meyer, Folker; Neuweger, Heiko; Olsen, Gary; Olson, Robert; Osterman, Andrei; Portnoy, Vasiliy; Pusch, Gordon D.; Rodionov, Dmitry A.; Rül;ckert, Christian; Steiner, Jason; Stevens, Rick; Thiele, Ines; Vassieva, Olga; Ye, Yuzhen; Zagnitko, Olga; Vonstein, Veronika.

In: Nucleic acids research, Vol. 33, No. 17, 11.11.2005, p. 5691-5702.

Research output: Contribution to journalArticle

Overbeek, R, Begley, T, Butler, RM, Choudhuri, JV, Chuang, HY, Cohoon, M, de Crécy-Lagard, V, Diaz, N, Disz, T, Edwards, R, Fonstein, M, Frank, ED, Gerdes, S, Glass, EM, Goesmann, A, Hanson, A, Iwata-Reuyl, D, Jensen, R, Jamshidi, N, Krause, L, Kubal, M, Larsen, N, Linke, B, McHardy, AC, Meyer, F, Neuweger, H, Olsen, G, Olson, R, Osterman, A, Portnoy, V, Pusch, GD, Rodionov, DA, Rül;ckert, C, Steiner, J, Stevens, R, Thiele, I, Vassieva, O, Ye, Y, Zagnitko, O & Vonstein, V 2005, 'The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes', Nucleic acids research, vol. 33, no. 17, pp. 5691-5702. https://doi.org/10.1093/nar/gki866
Overbeek R, Begley T, Butler RM, Choudhuri JV, Chuang HY, Cohoon M et al. The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes. Nucleic acids research. 2005 Nov 11;33(17):5691-5702. https://doi.org/10.1093/nar/gki866
Overbeek, Ross ; Begley, Tadhg ; Butler, Ralph M. ; Choudhuri, Jomuna V. ; Chuang, Han Yu ; Cohoon, Matthew ; de Crécy-Lagard, Valérie ; Diaz, Naryttza ; Disz, Terry ; Edwards, Robert ; Fonstein, Michael ; Frank, Ed D. ; Gerdes, Svetlana ; Glass, Elizabeth M. ; Goesmann, Alexander ; Hanson, Andrew ; Iwata-Reuyl, Dirk ; Jensen, Roy ; Jamshidi, Neema ; Krause, Lutz ; Kubal, Michael ; Larsen, Niels ; Linke, Burkhard ; McHardy, Alice C. ; Meyer, Folker ; Neuweger, Heiko ; Olsen, Gary ; Olson, Robert ; Osterman, Andrei ; Portnoy, Vasiliy ; Pusch, Gordon D. ; Rodionov, Dmitry A. ; Rül;ckert, Christian ; Steiner, Jason ; Stevens, Rick ; Thiele, Ines ; Vassieva, Olga ; Ye, Yuzhen ; Zagnitko, Olga ; Vonstein, Veronika. / The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes. In: Nucleic acids research. 2005 ; Vol. 33, No. 17. pp. 5691-5702.
@article{b9433f7ec4fb4a01a1e60ee505718840,
title = "The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes",
abstract = "The release of the 1000th complete microbial genome will occur in the next two to three years. In anticipation of this milestone, the Fellowship for Interpretation of Genomes (FIG) launched the Project to Annotate 1000 Genomes. The project is built around the principle that the key to improved accuracy in high-throughput annotation technology is to have experts annotate single subsystems over the complete collection of genomes, rather than having an annotation expert attempt to annotate all of the genes in a single genome. Using the subsystems approach, all of the genes implementing the subsystem are analyzed by an expert in that subsystem. An annotation environment was created where populated subsystems are curated and projected to new genomes. A portable notion of a populated subsystem was defined, and tools developed for exchanging and curating these objects. Tools were also developed to resolve conflicts between populated subsystems. The SEED is the first annotation environment that supports this model of annotation. Here, we describe the subsystem approach, and offer the first release of our growing library of populated subsystems. The initial release of data includes 180 177 distinct proteins with 2133 distinct functional roles. This data comes from 173 subsystems and 383 different organisms.",
author = "Ross Overbeek and Tadhg Begley and Butler, {Ralph M.} and Choudhuri, {Jomuna V.} and Chuang, {Han Yu} and Matthew Cohoon and {de Cr{\'e}cy-Lagard}, Val{\'e}rie and Naryttza Diaz and Terry Disz and Robert Edwards and Michael Fonstein and Frank, {Ed D.} and Svetlana Gerdes and Glass, {Elizabeth M.} and Alexander Goesmann and Andrew Hanson and Dirk Iwata-Reuyl and Roy Jensen and Neema Jamshidi and Lutz Krause and Michael Kubal and Niels Larsen and Burkhard Linke and McHardy, {Alice C.} and Folker Meyer and Heiko Neuweger and Gary Olsen and Robert Olson and Andrei Osterman and Vasiliy Portnoy and Pusch, {Gordon D.} and Rodionov, {Dmitry A.} and Christian R{\"u}l;ckert and Jason Steiner and Rick Stevens and Ines Thiele and Olga Vassieva and Yuzhen Ye and Olga Zagnitko and Veronika Vonstein",
year = "2005",
month = "11",
day = "11",
doi = "10.1093/nar/gki866",
language = "English (US)",
volume = "33",
pages = "5691--5702",
journal = "Nucleic Acids Research",
issn = "0305-1048",
publisher = "Oxford University Press",
number = "17",

}

TY - JOUR

T1 - The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes

AU - Overbeek, Ross

AU - Begley, Tadhg

AU - Butler, Ralph M.

AU - Choudhuri, Jomuna V.

AU - Chuang, Han Yu

AU - Cohoon, Matthew

AU - de Crécy-Lagard, Valérie

AU - Diaz, Naryttza

AU - Disz, Terry

AU - Edwards, Robert

AU - Fonstein, Michael

AU - Frank, Ed D.

AU - Gerdes, Svetlana

AU - Glass, Elizabeth M.

AU - Goesmann, Alexander

AU - Hanson, Andrew

AU - Iwata-Reuyl, Dirk

AU - Jensen, Roy

AU - Jamshidi, Neema

AU - Krause, Lutz

AU - Kubal, Michael

AU - Larsen, Niels

AU - Linke, Burkhard

AU - McHardy, Alice C.

AU - Meyer, Folker

AU - Neuweger, Heiko

AU - Olsen, Gary

AU - Olson, Robert

AU - Osterman, Andrei

AU - Portnoy, Vasiliy

AU - Pusch, Gordon D.

AU - Rodionov, Dmitry A.

AU - Rül;ckert, Christian

AU - Steiner, Jason

AU - Stevens, Rick

AU - Thiele, Ines

AU - Vassieva, Olga

AU - Ye, Yuzhen

AU - Zagnitko, Olga

AU - Vonstein, Veronika

PY - 2005/11/11

Y1 - 2005/11/11

N2 - The release of the 1000th complete microbial genome will occur in the next two to three years. In anticipation of this milestone, the Fellowship for Interpretation of Genomes (FIG) launched the Project to Annotate 1000 Genomes. The project is built around the principle that the key to improved accuracy in high-throughput annotation technology is to have experts annotate single subsystems over the complete collection of genomes, rather than having an annotation expert attempt to annotate all of the genes in a single genome. Using the subsystems approach, all of the genes implementing the subsystem are analyzed by an expert in that subsystem. An annotation environment was created where populated subsystems are curated and projected to new genomes. A portable notion of a populated subsystem was defined, and tools developed for exchanging and curating these objects. Tools were also developed to resolve conflicts between populated subsystems. The SEED is the first annotation environment that supports this model of annotation. Here, we describe the subsystem approach, and offer the first release of our growing library of populated subsystems. The initial release of data includes 180 177 distinct proteins with 2133 distinct functional roles. This data comes from 173 subsystems and 383 different organisms.

AB - The release of the 1000th complete microbial genome will occur in the next two to three years. In anticipation of this milestone, the Fellowship for Interpretation of Genomes (FIG) launched the Project to Annotate 1000 Genomes. The project is built around the principle that the key to improved accuracy in high-throughput annotation technology is to have experts annotate single subsystems over the complete collection of genomes, rather than having an annotation expert attempt to annotate all of the genes in a single genome. Using the subsystems approach, all of the genes implementing the subsystem are analyzed by an expert in that subsystem. An annotation environment was created where populated subsystems are curated and projected to new genomes. A portable notion of a populated subsystem was defined, and tools developed for exchanging and curating these objects. Tools were also developed to resolve conflicts between populated subsystems. The SEED is the first annotation environment that supports this model of annotation. Here, we describe the subsystem approach, and offer the first release of our growing library of populated subsystems. The initial release of data includes 180 177 distinct proteins with 2133 distinct functional roles. This data comes from 173 subsystems and 383 different organisms.

UR - http://www.scopus.com/inward/record.url?scp=25644458211&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=25644458211&partnerID=8YFLogxK

U2 - 10.1093/nar/gki866

DO - 10.1093/nar/gki866

M3 - Article

C2 - 16214803

AN - SCOPUS:25644458211

VL - 33

SP - 5691

EP - 5702

JO - Nucleic Acids Research

JF - Nucleic Acids Research

SN - 0305-1048

IS - 17

ER -