TY - GEN
T1 - Embodied one-shot video recognition
T2 - 27th ACM International Conference on Multimedia, MM 2019
AU - Fu, Yuqian
AU - Wang, Chengrong
AU - Fu, Yanwei
AU - Wang, Yu Xiong
AU - Bai, Cong
AU - Xue, Xiangyang
AU - Jiang, Yu Gang
N1 - Funding Information:
This work was supported in part by National Key Research and Development Program of China under Grant 2018YFB1004300 and National Natural Science Foundation of China under Grant U1509206. We would like to thank Dr. Ye Pan for his help.
Publisher Copyright:
© 2019 Association for Computing Machinery.
PY - 2019/10/15
Y1 - 2019/10/15
N2 - One-shot learning aims to recognize novel target classes from few examples by transferring knowledge from source classes, under a general assumption that the source and target classes are semantically related but not exactly the same. Based on this assumption, recent work has focused on image-based one-shot learning, while little work has addressed video-based one-shot learning. One of the challenges lies in that it is difficult to maintain the disjoint-class assumption for videos, since video clips of target classes may potentially appear in the videos of source classes. To address this issue, we introduce a novel setting, termed as embodied agents based one-shot learning, which leverages synthetic videos produced in a virtual environment to understand realistic videos of target classes. In this setting, we further propose two types of learning tasks: embodied one-shot video domain adaptation and embodied one-shot video transfer recognition. These tasks serve as a testbed for evaluating video related one-shot learning tasks. In addition, we propose a general video segment augmentation method, which significantly facilitates a variety of one-shot learning tasks. Experimental results validate the soundness of our setting and learning tasks, and also show the effectiveness of our augmentation approach to video recognition in the small-sample size regime.
AB - One-shot learning aims to recognize novel target classes from few examples by transferring knowledge from source classes, under a general assumption that the source and target classes are semantically related but not exactly the same. Based on this assumption, recent work has focused on image-based one-shot learning, while little work has addressed video-based one-shot learning. One of the challenges lies in that it is difficult to maintain the disjoint-class assumption for videos, since video clips of target classes may potentially appear in the videos of source classes. To address this issue, we introduce a novel setting, termed as embodied agents based one-shot learning, which leverages synthetic videos produced in a virtual environment to understand realistic videos of target classes. In this setting, we further propose two types of learning tasks: embodied one-shot video domain adaptation and embodied one-shot video transfer recognition. These tasks serve as a testbed for evaluating video related one-shot learning tasks. In addition, we propose a general video segment augmentation method, which significantly facilitates a variety of one-shot learning tasks. Experimental results validate the soundness of our setting and learning tasks, and also show the effectiveness of our augmentation approach to video recognition in the small-sample size regime.
KW - Embodied Agents
KW - One-shot Learning
KW - Video Action Recognition
UR - http://www.scopus.com/inward/record.url?scp=85074867197&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85074867197&partnerID=8YFLogxK
U2 - 10.1145/3343031.3351015
DO - 10.1145/3343031.3351015
M3 - Conference contribution
AN - SCOPUS:85074867197
T3 - MM 2019 - Proceedings of the 27th ACM International Conference on Multimedia
SP - 411
EP - 419
BT - MM 2019 - Proceedings of the 27th ACM International Conference on Multimedia
PB - Association for Computing Machinery
Y2 - 21 October 2019 through 25 October 2019
ER -