Connecting touch and vision via cross-modal prediction

Yunzhu Li, Jun Yan Zhu, Russ Tedrake, Antonio Torralba

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Humans perceive the world using multi-modal sensory inputs such as vision, audition, and touch. In this work, we investigate the cross-modal connection between vision and touch. The main challenge in this cross-domain modeling task lies in the significant scale discrepancy between the two: While our eyes perceive an entire visual scene at once, humans can only feel a small region of an object at any given moment. To connect vision and touch, we introduce new tasks of synthesizing plausible tactile signals from visual inputs as well as imagining how we interact with objects given tactile data as input. To accomplish our goals, we first equip robots with both visual and tactile sensors and collect a large-scale dataset of corresponding vision and tactile image sequences. To close the scale gap, we present a new conditional adversarial model that incorporates the scale and location information of the touch. Human perceptual studies demonstrate that our model can produce realistic visual images from tactile data and vice versa. Finally, we present both qualitative and quantitative experimental results regarding different system designs, as well as visualizing the learned representations of our model.

Original languageEnglish (US)
Title of host publicationProceedings - 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2019
PublisherIEEE Computer Society
Pages10601-10610
Number of pages10
ISBN (Electronic)9781728132938
DOIs
StatePublished - Jun 2019
Externally publishedYes
Event32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2019 - Long Beach, United States
Duration: Jun 16 2019Jun 20 2019

Publication series

NameProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Volume2019-June
ISSN (Print)1063-6919

Conference

Conference32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2019
Country/TerritoryUnited States
CityLong Beach
Period6/16/196/20/19

Keywords

  • Image and Video Synthesis
  • Vision Applications and Systems

ASJC Scopus subject areas

  • Software
  • Computer Vision and Pattern Recognition

Fingerprint

Dive into the research topics of 'Connecting touch and vision via cross-modal prediction'. Together they form a unique fingerprint.

Cite this