Abstract
Protein sequence alignment is a fundamental problem in computational structure biology and popular for protein 3D structural prediction and protein homology detection. Most of the developed programs for detecting protein sequence alignments are based upon the likelihood information of amino acids and are sensitive to alignment noises. We present a robust method PALM for modeling pairwise protein structure alignments, using the area distance to reduce the biological measurement noise. PALM generatively learn the alignment of two protein sequences with probabilistic area distance objective, which can denoise the measurement errors and offsets from different biologists. During learning, we show that the optimization is computationally efficient by estimating the gradients via dynamically sampling alignments. Empirically, we show that PALM can generate sequence alignments with higher precision and recall, as well as smaller area distance than the competing methods especially for long protein sequences and remote homologies. This study implies for learning over large-scale protein sequence alignment problems, one could potentially give PALM a try.
Original language | English (US) |
---|---|
Pages (from-to) | 1100-1109 |
Number of pages | 10 |
Journal | Proceedings of Machine Learning Research |
Volume | 161 |
State | Published - 2021 |
Externally published | Yes |
Event | 37th Conference on Uncertainty in Artificial Intelligence, UAI 2021 - Virtual, Online Duration: Jul 27 2021 → Jul 30 2021 |
ASJC Scopus subject areas
- Artificial Intelligence
- Software
- Control and Systems Engineering
- Statistics and Probability