CRITICA: Coding region identification tool invoking comparative analysis

Jonathan H. Badger, Gary J. Olsen

Research output: Contribution to journalArticle

Abstract

Gene recognition is essential to understanding existing and future DNA sequence data. CRITICA (Coding Region Identification Tool Invoking Comparative Analysis) is a suite of programs for identifying likely protein- coding sequences in DNA by combining comparative analysis of DNA sequences with more common noncomparative methods. In the comparative component of the analysis, regions of DNA are aligned with related sequences from the DNA databases; if the translation of the aligned sequences has greater amino acid identity than expected for the observed percentage nucleotide identity, this is interpreted as evidence for coding. CRITICA also incorporates noncomparative information derived from the relative frequencies of hexanucleotides in coding frames versus other contexts (i.e., dicodon bias). The dicodon usage information is derived by iterative analysis of the data, such that CRITICA is not dependent on the existence or accuracy of coding sequence annotations in the databases. This independence makes the method particularly well suited for the analysis of novel genomes. CRITICA was tested by analyzing the available Salmonella typhimurium DNA sequences. Its predictions were compared with the DNA sequence annotations and with the predictions of GenMark. CRITICA proved to be more accurate than GenMark, and moreover, many of its predictors that would seem to be errors instead reflect problems in the sequence databases. The source code of CRITICA is freely available by anonymous FTP (rdp.life.uiuc.edu in /pub/critica) and on the World Wide Web (http://rdpwww.life.uiuc.edu).

Original languageEnglish (US)
Pages (from-to)512-524
Number of pages13
JournalMolecular biology and evolution
Volume16
Issue number4
DOIs
StatePublished - Apr 1999

Fingerprint

nucleotide sequences
DNA
Databases
Nucleic Acid Databases
Salmonella typhimurium
world wide web
DNA Sequence Analysis
Internet
prediction
Nucleotides
Salmonella Typhimurium
translation (genetics)
Genome
data analysis
Amino Acids
nucleotides
analysis
amino acids
World Wide Web
genome

Keywords

  • Coding sequence prediction
  • Dicodon bias
  • Genomics
  • Salmonella typhimurium
  • Sequence analysis

ASJC Scopus subject areas

  • Ecology, Evolution, Behavior and Systematics
  • Molecular Biology
  • Genetics

Cite this

CRITICA : Coding region identification tool invoking comparative analysis. / Badger, Jonathan H.; Olsen, Gary J.

In: Molecular biology and evolution, Vol. 16, No. 4, 04.1999, p. 512-524.

Research output: Contribution to journalArticle

@article{63a9b0e49fe3402d99d3b3044235aa7f,
title = "CRITICA: Coding region identification tool invoking comparative analysis",
abstract = "Gene recognition is essential to understanding existing and future DNA sequence data. CRITICA (Coding Region Identification Tool Invoking Comparative Analysis) is a suite of programs for identifying likely protein- coding sequences in DNA by combining comparative analysis of DNA sequences with more common noncomparative methods. In the comparative component of the analysis, regions of DNA are aligned with related sequences from the DNA databases; if the translation of the aligned sequences has greater amino acid identity than expected for the observed percentage nucleotide identity, this is interpreted as evidence for coding. CRITICA also incorporates noncomparative information derived from the relative frequencies of hexanucleotides in coding frames versus other contexts (i.e., dicodon bias). The dicodon usage information is derived by iterative analysis of the data, such that CRITICA is not dependent on the existence or accuracy of coding sequence annotations in the databases. This independence makes the method particularly well suited for the analysis of novel genomes. CRITICA was tested by analyzing the available Salmonella typhimurium DNA sequences. Its predictions were compared with the DNA sequence annotations and with the predictions of GenMark. CRITICA proved to be more accurate than GenMark, and moreover, many of its predictors that would seem to be errors instead reflect problems in the sequence databases. The source code of CRITICA is freely available by anonymous FTP (rdp.life.uiuc.edu in /pub/critica) and on the World Wide Web (http://rdpwww.life.uiuc.edu).",
keywords = "Coding sequence prediction, Dicodon bias, Genomics, Salmonella typhimurium, Sequence analysis",
author = "Badger, {Jonathan H.} and Olsen, {Gary J.}",
year = "1999",
month = "4",
doi = "10.1093/oxfordjournals.molbev.a026133",
language = "English (US)",
volume = "16",
pages = "512--524",
journal = "Molecular Biology and Evolution",
issn = "0737-4038",
publisher = "Oxford University Press",
number = "4",

}

TY - JOUR

T1 - CRITICA

T2 - Coding region identification tool invoking comparative analysis

AU - Badger, Jonathan H.

AU - Olsen, Gary J.

PY - 1999/4

Y1 - 1999/4

N2 - Gene recognition is essential to understanding existing and future DNA sequence data. CRITICA (Coding Region Identification Tool Invoking Comparative Analysis) is a suite of programs for identifying likely protein- coding sequences in DNA by combining comparative analysis of DNA sequences with more common noncomparative methods. In the comparative component of the analysis, regions of DNA are aligned with related sequences from the DNA databases; if the translation of the aligned sequences has greater amino acid identity than expected for the observed percentage nucleotide identity, this is interpreted as evidence for coding. CRITICA also incorporates noncomparative information derived from the relative frequencies of hexanucleotides in coding frames versus other contexts (i.e., dicodon bias). The dicodon usage information is derived by iterative analysis of the data, such that CRITICA is not dependent on the existence or accuracy of coding sequence annotations in the databases. This independence makes the method particularly well suited for the analysis of novel genomes. CRITICA was tested by analyzing the available Salmonella typhimurium DNA sequences. Its predictions were compared with the DNA sequence annotations and with the predictions of GenMark. CRITICA proved to be more accurate than GenMark, and moreover, many of its predictors that would seem to be errors instead reflect problems in the sequence databases. The source code of CRITICA is freely available by anonymous FTP (rdp.life.uiuc.edu in /pub/critica) and on the World Wide Web (http://rdpwww.life.uiuc.edu).

AB - Gene recognition is essential to understanding existing and future DNA sequence data. CRITICA (Coding Region Identification Tool Invoking Comparative Analysis) is a suite of programs for identifying likely protein- coding sequences in DNA by combining comparative analysis of DNA sequences with more common noncomparative methods. In the comparative component of the analysis, regions of DNA are aligned with related sequences from the DNA databases; if the translation of the aligned sequences has greater amino acid identity than expected for the observed percentage nucleotide identity, this is interpreted as evidence for coding. CRITICA also incorporates noncomparative information derived from the relative frequencies of hexanucleotides in coding frames versus other contexts (i.e., dicodon bias). The dicodon usage information is derived by iterative analysis of the data, such that CRITICA is not dependent on the existence or accuracy of coding sequence annotations in the databases. This independence makes the method particularly well suited for the analysis of novel genomes. CRITICA was tested by analyzing the available Salmonella typhimurium DNA sequences. Its predictions were compared with the DNA sequence annotations and with the predictions of GenMark. CRITICA proved to be more accurate than GenMark, and moreover, many of its predictors that would seem to be errors instead reflect problems in the sequence databases. The source code of CRITICA is freely available by anonymous FTP (rdp.life.uiuc.edu in /pub/critica) and on the World Wide Web (http://rdpwww.life.uiuc.edu).

KW - Coding sequence prediction

KW - Dicodon bias

KW - Genomics

KW - Salmonella typhimurium

KW - Sequence analysis

UR - http://www.scopus.com/inward/record.url?scp=0032900737&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0032900737&partnerID=8YFLogxK

U2 - 10.1093/oxfordjournals.molbev.a026133

DO - 10.1093/oxfordjournals.molbev.a026133

M3 - Article

C2 - 10331277

AN - SCOPUS:0032900737

VL - 16

SP - 512

EP - 524

JO - Molecular Biology and Evolution

JF - Molecular Biology and Evolution

SN - 0737-4038

IS - 4

ER -