Searching the expressed sequence tag (EST) databases: panning for genes.

C. V. Jongeneel

Research output: Contribution to journalArticlepeer-review


The genomes of living organisms contain many elements, including genes coding for proteins. The portions of the genes expressed as mature mRNA, collectively known as the transcriptome, represent only a small part of the genome. The expressed sequence tag (EST) databases contain an increasingly large part of the transcriptome of many species. For this reason, these databases are probably the most abundant source of new coding sequences available today. However, the raw data deposited in the EST databases are to a large extent unorganised, unannotated, redundant and of relatively low quality. This paper reviews some of the characteristics of the EST data, and the methods that can be used to find novel protein sequences within them. It also documents a collection of databases, software and web sites that can be useful to biologists interested in mining the EST databases over the Internet, or in establishing a local environment for such analyses.

Original languageEnglish (US)
Pages (from-to)76-92
Number of pages17
JournalBriefings in bioinformatics
Issue number1
StatePublished - Feb 2000
Externally publishedYes

ASJC Scopus subject areas

  • Information Systems
  • Molecular Biology


Dive into the research topics of 'Searching the expressed sequence tag (EST) databases: panning for genes.'. Together they form a unique fingerprint.

Cite this