Abstract
Motivation: The production of neuropeptides from their precursor proteins is the result of a complex series of enzymatic processing steps. Often, the annotation of new neuropeptide genes from sequence information outstrips biochemical assays and so bioinformatics tools can provide rapid information on the most likely peptides produced by a gene. Predicting the final bioactive neuropeptides from precursor proteins requires accurate algorithms to determine which locations in the protein are cleaved. Results: Predictive models were trained on Apis mellifera and Drosophila melanogaster precursors using binary logistic regression, multi-layer perceptron and k-nearest neighbor models. The final predictive models included specific amino acids at locations relative to the cleavage sites. Correct classification rates ranged from 78 to 100% indicating that the models adequately predicted cleaved and non-cleaved positions across a wide range of neuropeptide families and insect species. The model trained on D.melanogaster data had better generalization properties than the model trained on A. mellifera for the data sets considered. The reliable and consistent performance of the models in the test data sets suggests that the bioinformatics strategies proposed here can accurately predict neuropeptides in insects with sequence information based on neuropeptides with biochemical and sequence information in well-studied species.
Original language | English (US) |
---|---|
Pages (from-to) | 815-825 |
Number of pages | 11 |
Journal | Bioinformatics |
Volume | 24 |
Issue number | 6 |
DOIs | |
State | Published - Mar 2008 |
ASJC Scopus subject areas
- Statistics and Probability
- Biochemistry
- Molecular Biology
- Computer Science Applications
- Computational Theory and Mathematics
- Computational Mathematics