TY - JOUR
T1 - Assessment and Comparison of Database Search Engines for Peptidomic Applications
AU - De La Toba, Eduardo A.
AU - Anapindi, Krishna D.B.
AU - Sweedler, Jonathan V.
N1 - Publisher Copyright:
© 2023 American Chemical Society.
PY - 2023/10/6
Y1 - 2023/10/6
N2 - Protein database search engines are an integral component of mass spectrometry-based peptidomic analyses. Given the unique computational challenges of peptidomics, many factors must be taken into consideration when optimizing search engine selection, as each platform has different algorithms by which tandem mass spectra are scored for subsequent peptide identifications. In this study, four different database search engines, PEAKS, MS-GF+, OMSSA, and X! Tandem, were compared with Aplysia californica and Rattus norvegicus peptidomics data sets, and various metrics were assessed such as the number of unique peptide and neuropeptide identifications, and peptide length distributions. Given the tested conditions, PEAKS was found to have the highest number of peptide and neuropeptide identifications out of the four search engines in both data sets. Furthermore, principal component analysis and multivariate logistic regression were employed to determine whether specific spectral features contribute to false C-terminal amidation assignments by each search engine. From this analysis, it was found that the primary features influencing incorrect peptide assignments were the precursor and fragment ion m/z errors. Finally, an assessment employing a mixed species protein database was performed to evaluate search engine precision and sensitivity when searched against an enlarged search space containing human proteins.
AB - Protein database search engines are an integral component of mass spectrometry-based peptidomic analyses. Given the unique computational challenges of peptidomics, many factors must be taken into consideration when optimizing search engine selection, as each platform has different algorithms by which tandem mass spectra are scored for subsequent peptide identifications. In this study, four different database search engines, PEAKS, MS-GF+, OMSSA, and X! Tandem, were compared with Aplysia californica and Rattus norvegicus peptidomics data sets, and various metrics were assessed such as the number of unique peptide and neuropeptide identifications, and peptide length distributions. Given the tested conditions, PEAKS was found to have the highest number of peptide and neuropeptide identifications out of the four search engines in both data sets. Furthermore, principal component analysis and multivariate logistic regression were employed to determine whether specific spectral features contribute to false C-terminal amidation assignments by each search engine. From this analysis, it was found that the primary features influencing incorrect peptide assignments were the precursor and fragment ion m/z errors. Finally, an assessment employing a mixed species protein database was performed to evaluate search engine precision and sensitivity when searched against an enlarged search space containing human proteins.
KW - C-terminal amidation
KW - database searching
KW - neuropeptides
KW - peptidomics
KW - post-translational modifications
KW - search engines
UR - http://www.scopus.com/inward/record.url?scp=85148912236&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85148912236&partnerID=8YFLogxK
U2 - 10.1021/acs.jproteome.2c00307
DO - 10.1021/acs.jproteome.2c00307
M3 - Article
C2 - 36809008
AN - SCOPUS:85148912236
SN - 1535-3893
VL - 22
SP - 3123
EP - 3134
JO - Journal of Proteome Research
JF - Journal of Proteome Research
IS - 10
ER -