TY - JOUR
T1 - SAMP
T2 - Identifying antimicrobial peptides by an ensemble learning model based on proportionalized split amino acid composition
AU - Feng, Junxi
AU - Sun, Mengtao
AU - Liu, Cong
AU - Zhang, Weiwei
AU - Xu, Changmou
AU - Wang, Jieqiong
AU - Wang, Guangshun
AU - Wan, Shibiao
N1 - Publisher Copyright:
© The Author(s) 2024. Published by Oxford University Press. All rights reserved.
PY - 2024/11/1
Y1 - 2024/11/1
N2 - It is projected that 10 million deaths could be attributed to drug-resistant bacteria infections in 2050. To address this concern, identifying new-generation antibiotics is an effective way. Antimicrobial peptides (AMPs), a class of innate immune effectors, have received significant attention for their capacity to eliminate drug-resistant pathogens, including viruses, bacteria, and fungi. Recent years have witnessed widespread applications of computational methods especially machine learning (ML) and deep learning (DL) for discovering AMPs. However, existing methods only use features including compositional, physiochemical, and structural properties of peptides, which cannot fully capture sequence information from AMPs. Here, we present SAMP, an ensemble random projection (RP) based computational model that leverages a new type of feature called proportionalized split amino acid composition (PSAAC) in addition to conventional sequence-based features for AMP prediction. With this new feature set, SAMP captures the residue patterns like sorting signals at both the N-terminal and the C-terminal, while also retaining the sequence order information from the middle peptide fragments. Benchmarking tests on different balanced and imbalanced datasets demonstrate that SAMP consistently outperforms existing state-of-the-art methods, such as iAMPpred and AMPScanner V2, in terms of accuracy, Matthews correlation coefficient (MCC), G-measure, and F1-score. In addition, by leveraging an ensemble RP architecture, SAMP is scalable to processing large-scale AMP identification with further performance improvement, compared to those models without RP. To facilitate the use of SAMP, we have developed a Python package that is freely available at https://github.com/wan-mlab/SAMP.
AB - It is projected that 10 million deaths could be attributed to drug-resistant bacteria infections in 2050. To address this concern, identifying new-generation antibiotics is an effective way. Antimicrobial peptides (AMPs), a class of innate immune effectors, have received significant attention for their capacity to eliminate drug-resistant pathogens, including viruses, bacteria, and fungi. Recent years have witnessed widespread applications of computational methods especially machine learning (ML) and deep learning (DL) for discovering AMPs. However, existing methods only use features including compositional, physiochemical, and structural properties of peptides, which cannot fully capture sequence information from AMPs. Here, we present SAMP, an ensemble random projection (RP) based computational model that leverages a new type of feature called proportionalized split amino acid composition (PSAAC) in addition to conventional sequence-based features for AMP prediction. With this new feature set, SAMP captures the residue patterns like sorting signals at both the N-terminal and the C-terminal, while also retaining the sequence order information from the middle peptide fragments. Benchmarking tests on different balanced and imbalanced datasets demonstrate that SAMP consistently outperforms existing state-of-the-art methods, such as iAMPpred and AMPScanner V2, in terms of accuracy, Matthews correlation coefficient (MCC), G-measure, and F1-score. In addition, by leveraging an ensemble RP architecture, SAMP is scalable to processing large-scale AMP identification with further performance improvement, compared to those models without RP. To facilitate the use of SAMP, we have developed a Python package that is freely available at https://github.com/wan-mlab/SAMP.
KW - Antimicrobial peptides
KW - Ensemble learning
KW - Proportionalized split amino acid composition
KW - Random projection
KW - SAMP
UR - http://www.scopus.com/inward/record.url?scp=85212456890&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85212456890&partnerID=8YFLogxK
U2 - 10.1093/bfgp/elae046
DO - 10.1093/bfgp/elae046
M3 - Article
C2 - 39573886
AN - SCOPUS:85212456890
SN - 2041-2649
VL - 23
SP - 879
EP - 890
JO - Briefings in Functional Genomics
JF - Briefings in Functional Genomics
IS - 6
ER -