CVQVAE: A representation learning based method for multi-omics single cell data integration

Tianyu Liu, Grant Greenberg, Ilan Shomorony

Research output: Contribution to journalConference articlepeer-review


The rapid development of second-generation sequencing has brought about a significant increase in the amount of omics data. Integrating and analyzing these single-cell datasets is a challenging problem. In this paper, we propose a new model, called as CVQVAE, based on a cross-trained VAE, and strengthened by the Vector Quantization technique for multi-omics data integration. CVQVAE projects data vectors from different omics onto a common latent space in such a way that (1) similar cells are close in the latent space and (2) the original biological information present in each of the omics (including cell cycle and trajectory) are preserved. Our model is trained and optimized solely based on the multi-omics data and requires no additional information such as cell-type labels. We empirically demonstrate the stability and efficiency of our method in data integration (alignment) on datasets from a recent competition on Open Problems in Single Cell Analysis.

Original languageEnglish (US)
Pages (from-to)1-15
Number of pages15
JournalProceedings of Machine Learning Research
StatePublished - 2022
Event17th Machine Learning in Computational Biology Meeting, MLCB 2022 - Virtual, Online
Duration: Nov 21 2022Nov 22 2022

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Statistics and Probability


Dive into the research topics of 'CVQVAE: A representation learning based method for multi-omics single cell data integration'. Together they form a unique fingerprint.

Cite this