Chapter nine Mining soybean expressed sequence tag and microarray data

Martina V. Strömvik, Françoise Thibaud-Nissen, Lila O. Vodkin

Research output: Contribution to journalArticlepeer-review


In summary, we have illustrated the approach of mining the large soybean EST collection to deduce knowledge about expression of individual gene family members using the soybean lectins as an example. Plants have many different sets of genes that are homologous at the sequence level, but which may have very different biological functions in different cells, tissues, or organs in which the gene product is active. The sequence alone provides little information about function. By adding the information about spatial and temporal gene expression (as from 'electronic northerms'), we get a first view of gene expression profiles that puts us one step closer to unterstanding their functions and the chemical pathways in which they participate. In the case of lectins, we can be relatively certain the seed lectin product of the Le1 has a role as a seed storage protein because of the abundance of ESTs representing Le1 in only the seed tissues. In addition, we see that there are three homologs closely related to Le1, but none of these is expressed in seeds. These related lectins may have function as vegetative storage proteins or defend against pathogens in vegetative tissues. In the future, the different lectins can be investigated for insecticidal properties and possibly used in pathogen defense strategies using genetic engineering. The more distantly related apyrases, whose ESTs are found in root libraries, likely have a function in recognition of bacteria. As opposed to ESTs, microarrays provide a quantitative approach to globalgene expression, and microarray data can be subjected to advanced statistical clustering analysis. Clustering the cDNAs by similarity of expression profile derived from microarray data over the course of early somatic embryogenesis allowed a determination of the timing of the molecular events taking place during that phase of development. For example, several genes involved in polarity (adaxial versus abaxial) of the tissue culture soybean embryos were found3, and these could be targets for improving the process of regeneration of embryos from tissue culture. Of course, mRNA abundance data alone do not ensure that the metabolicproducts of a pathway are present since control can be exerted at multiple levels, including, transcriptional, post-transcriptional, translational, and post-translational. However, mining the soybean EST databases and determining transcript profiles of many thousands of genes simultaneously using microarrays was not feasible for soybean until recently. These approaches will stimulate many avenues of research into the complex physiology and metabolic systems operational in this important crop. For example, we are currently determining the expression profiles, of 27,000 soybean cDNAs during normal seed development in order to elucidate the developmental profiles of genes that produce compositional traits as protein, oil, and secondary compounds. In addition, germplasm lines that vary in protein, oil, or isoflavone content can be examined by microarrays to determine key control points in production of these compounds under normal or stress conditions. These studies will yield information that can be applied toward breeding or genetic engineering approaches to improve seed composition. Finally, the cDNA microarrays will enable similar studies in related legumes that do not have genomics resources yet available.

Original languageEnglish (US)
Pages (from-to)177-195
Number of pages19
JournalRecent Advances in Phytochemistry
Issue numberC
StatePublished - 2004

ASJC Scopus subject areas

  • Biochemistry
  • Molecular Biology
  • Plant Science
  • Cell Biology


Dive into the research topics of 'Chapter nine Mining soybean expressed sequence tag and microarray data'. Together they form a unique fingerprint.

Cite this