MotifNetwork: A grid-enabled workflow for high-throughput domain analysis of biological sequences: Implications for annotation and study of phylogeny, protein interactions, and intraspecies variation

Jeffrey L. Tilson, Gloria Rendon, Mao Feng Ger, Eric Jakobsson

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Traditionally, bioinformatics has been organized around the concepts of genes and gene products, typically proteins. Proteins are represented as sequences of amino acids and are analyzed against each other by alignment and similarity of their amino acids. However proteins contain subsequences that define their activity and mode of regulation. These subsequences are referred to as "domains" and "motifs". For understanding many aspects of gene function, gene interaction, and gene and organism evolution, there is an advantage to focusing analysis on the domain/motif level rather than on the gene level. Such analysis is inherently highly computationally intensive because of the exponential growth of the protein databases and the combinatorial number of ways in which domains and motifs interact with each other. Here we report, by means of a biological example, on our efforts to build a user-friendly environment for facilitating such analysis. The name of this environment is the MotifNetwork. The MotifNetwork is an integration effort to build a suite of biologically oriented and grid-enabled workflows for high throughput domain analysis of protein sequences. The workflow orchestration and enactment is handled with Taverna. [Oinn, 2004] The supporting grid-enabling services used to wrap and invoke the computational applications are implemented with the Generic Service Toolkit (GST) [Kandaswamy, 2006]. The ultimate results of this environment are data products, organized as matrices, and visualization files suitable for quick analysis. Detailed descriptions of data products from a representative biological example are presented. Lastly, some preliminary performance data are displayed including use of the workflow to determine the domain architecture of all proteins in a complete genome (the honeybee). Extension to comprehensive analysis of SNP's in a genome is discussed. The MotifNetwork workflow is or will soon be available online through the RENCI Science Gateway at http://www.tgbioportal.org/.

Original languageEnglish (US)
Title of host publicationProceedings of the 7th IEEE International Conference on Bioinformatics and Bioengineering, BIBE
Pages620-627
Number of pages8
DOIs
StatePublished - Dec 1 2007
Event7th IEEE International Conference on Bioinformatics and Bioengineering, BIBE - Boston, MA, United States
Duration: Jan 14 2007Jan 17 2007

Publication series

NameProceedings of the 7th IEEE International Conference on Bioinformatics and Bioengineering, BIBE

Conference

Conference7th IEEE International Conference on Bioinformatics and Bioengineering, BIBE
CountryUnited States
CityBoston, MA
Period1/14/071/17/07

ASJC Scopus subject areas

  • Biotechnology
  • Genetics
  • Bioengineering

Fingerprint Dive into the research topics of 'MotifNetwork: A grid-enabled workflow for high-throughput domain analysis of biological sequences: Implications for annotation and study of phylogeny, protein interactions, and intraspecies variation'. Together they form a unique fingerprint.

Cite this