InfoMAE: Pair-Efficient Cross-Modal Alignment for Multimodal Time-Series Sensing Signals

Tomoyoshi Kimura, Xinlin Li, Osama Hanna, Yatong Chen, Yizhuo Chen, Denizhan Kara, Tianshi Wang, Jinyang Li, Xiaomin Ouyang, Shengzhong Liu, Mani Srivastava, Suhas Diggavi, Tarek Abdelzaher

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Standard multimodal self-supervised learning (SSL) algorithms regard cross-modal synchronization as implicit supervisory labels during pretraining, thus posing high requirements on the scale and quality of multimodal samples. These constraints significantly limit the performance of sensing intelligence in IoT applications, as the heterogeneity and the non-interpretability of time-series signals result in abundant unimodal data but scarce high-quality multimodal pairs. This paper proposes InfoMAE, a cross-modal alignment framework that tackles the challenge of multimodal pair efficiency under the SSL setting by facilitating efficient cross-modal alignment of pretrained unimodal representations. InfoMAE achieves efficient cross-modal alignment with limited data pairs through a novel information theory-inspired formulation that simultaneously addresses distribution-level and instance-level alignment. Extensive experiments on two real-world IoT applications are performed to evaluate InfoMAE’s pairing efficiency to bridge pretrained unimodal models into a cohesive joint multimodal model. InfoMAE enhances downstream multimodal tasks by over 60% with significantly improved multimodal pairing efficiency. It also improves unimodal task accuracy by an average of 22%.

Original languageEnglish (US)
Title of host publicationWWW 2025 - Proceedings of the ACM Web Conference
PublisherAssociation for Computing Machinery
Pages3084-3095
Number of pages12
ISBN (Electronic)9798400712746
DOIs
StatePublished - Apr 28 2025
Event34th ACM Web Conference, WWW 2025 - Sydney, Australia
Duration: Apr 28 2025May 2 2025

Publication series

NameWWW 2025 - Proceedings of the ACM Web Conference

Conference

Conference34th ACM Web Conference, WWW 2025
Country/TerritoryAustralia
CitySydney
Period4/28/255/2/25

Keywords

  • Internet of Things
  • Multimodal sensing
  • Self-supervised learning

ASJC Scopus subject areas

  • Information Systems and Management
  • Statistics, Probability and Uncertainty
  • Safety, Risk, Reliability and Quality
  • Modeling and Simulation
  • Artificial Intelligence
  • Computer Networks and Communications
  • Information Systems

Fingerprint

Dive into the research topics of 'InfoMAE: Pair-Efficient Cross-Modal Alignment for Multimodal Time-Series Sensing Signals'. Together they form a unique fingerprint.

Cite this