TY - GEN
T1 - CoMusion
T2 - 18th European Conference on Computer Vision, ECCV 2024
AU - Sun, Jiarui
AU - Chowdhary, Girish
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
PY - 2025
Y1 - 2025
N2 - Stochastic Human Motion Prediction (HMP) aims to predict multiple possible future human pose sequences from observed ones. Most prior works learn motion distributions through encoding-decoding in the latent space, which does not preserve motion’s spatial-temporal structure. While effective, these methods often require complex, multi-stage training and yield predictions that are inconsistent with the provided history and can be physically unrealistic. To address these issues, we propose CoMusion, a single-stage, end-to-end diffusion-based stochastic HMP framework. CoMusion is inspired from the insight that a smooth future pose initialization improves prediction performance, a strategy not previously utilized in stochastic models but evidenced in deterministic works. To generate such initialization, CoMusion’s motion predictor starts with a Transformer-based network for initial reconstruction of corrupted motion. Then, a graph convolutional network (GCN) is employed to refine the prediction considering past observations in the discrete cosine transformation (DCT) space. Our method, facilitated by the Transformer-GCN module design and a proposed variance scheduler, excels in predicting accurate, realistic, and consistent motions, while maintaining appropriate diversity. Experimental results on benchmark datasets demonstrate that CoMusion surpasses prior methods across metrics, while demonstrating superior generation quality.
AB - Stochastic Human Motion Prediction (HMP) aims to predict multiple possible future human pose sequences from observed ones. Most prior works learn motion distributions through encoding-decoding in the latent space, which does not preserve motion’s spatial-temporal structure. While effective, these methods often require complex, multi-stage training and yield predictions that are inconsistent with the provided history and can be physically unrealistic. To address these issues, we propose CoMusion, a single-stage, end-to-end diffusion-based stochastic HMP framework. CoMusion is inspired from the insight that a smooth future pose initialization improves prediction performance, a strategy not previously utilized in stochastic models but evidenced in deterministic works. To generate such initialization, CoMusion’s motion predictor starts with a Transformer-based network for initial reconstruction of corrupted motion. Then, a graph convolutional network (GCN) is employed to refine the prediction considering past observations in the discrete cosine transformation (DCT) space. Our method, facilitated by the Transformer-GCN module design and a proposed variance scheduler, excels in predicting accurate, realistic, and consistent motions, while maintaining appropriate diversity. Experimental results on benchmark datasets demonstrate that CoMusion surpasses prior methods across metrics, while demonstrating superior generation quality.
KW - Diffusion Models
KW - Stochastic Human Motion Prediction
UR - http://www.scopus.com/inward/record.url?scp=85210817492&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85210817492&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-73036-8_2
DO - 10.1007/978-3-031-73036-8_2
M3 - Conference contribution
AN - SCOPUS:85210817492
SN - 9783031730351
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 18
EP - 36
BT - Computer Vision – ECCV 2024 - 18th European Conference, Proceedings
A2 - Leonardis, Aleš
A2 - Ricci, Elisa
A2 - Roth, Stefan
A2 - Russakovsky, Olga
A2 - Sattler, Torsten
A2 - Varol, Gül
PB - Springer
Y2 - 29 September 2024 through 4 October 2024
ER -