Abstract
The rapid development of second-generation sequencing has brought about a significant increase in the amount of omics data. Integrating and analyzing these single-cell datasets is a challenging problem. In this paper, we propose a new model, called as CVQVAE, based on a cross-trained VAE, and strengthened by the Vector Quantization technique for multi-omics data integration. CVQVAE projects data vectors from different omics onto a common latent space in such a way that (1) similar cells are close in the latent space and (2) the original biological information present in each of the omics (including cell cycle and trajectory) are preserved. Our model is trained and optimized solely based on the multi-omics data and requires no additional information such as cell-type labels. We empirically demonstrate the stability and efficiency of our method in data integration (alignment) on datasets from a recent competition on Open Problems in Single Cell Analysis.
Original language | English (US) |
---|---|
Pages (from-to) | 1-15 |
Number of pages | 15 |
Journal | Proceedings of Machine Learning Research |
Volume | 200 |
State | Published - 2022 |
Event | 17th Machine Learning in Computational Biology Meeting, MLCB 2022 - Virtual, Online Duration: Nov 21 2022 → Nov 22 2022 |
ASJC Scopus subject areas
- Artificial Intelligence
- Software
- Control and Systems Engineering
- Statistics and Probability