CoMusion: Towards Consistent Stochastic Human Motion Prediction via Motion Diffusion

Jiarui Sun, Girish Chowdhary

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Stochastic Human Motion Prediction (HMP) aims to predict multiple possible future human pose sequences from observed ones. Most prior works learn motion distributions through encoding-decoding in the latent space, which does not preserve motion’s spatial-temporal structure. While effective, these methods often require complex, multi-stage training and yield predictions that are inconsistent with the provided history and can be physically unrealistic. To address these issues, we propose CoMusion, a single-stage, end-to-end diffusion-based stochastic HMP framework. CoMusion is inspired from the insight that a smooth future pose initialization improves prediction performance, a strategy not previously utilized in stochastic models but evidenced in deterministic works. To generate such initialization, CoMusion’s motion predictor starts with a Transformer-based network for initial reconstruction of corrupted motion. Then, a graph convolutional network (GCN) is employed to refine the prediction considering past observations in the discrete cosine transformation (DCT) space. Our method, facilitated by the Transformer-GCN module design and a proposed variance scheduler, excels in predicting accurate, realistic, and consistent motions, while maintaining appropriate diversity. Experimental results on benchmark datasets demonstrate that CoMusion surpasses prior methods across metrics, while demonstrating superior generation quality.

Original languageEnglish (US)
Title of host publicationComputer Vision – ECCV 2024 - 18th European Conference, Proceedings
EditorsAleš Leonardis, Elisa Ricci, Stefan Roth, Olga Russakovsky, Torsten Sattler, Gül Varol
PublisherSpringer
Pages18-36
Number of pages19
ISBN (Print)9783031730351
DOIs
StatePublished - 2025
Event18th European Conference on Computer Vision, ECCV 2024 - Milan, Italy
Duration: Sep 29 2024Oct 4 2024

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume15121 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference18th European Conference on Computer Vision, ECCV 2024
Country/TerritoryItaly
CityMilan
Period9/29/2410/4/24

Keywords

  • Diffusion Models
  • Stochastic Human Motion Prediction

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'CoMusion: Towards Consistent Stochastic Human Motion Prediction via Motion Diffusion'. Together they form a unique fingerprint.

Cite this