TY - GEN
T1 - Learning models for actions and person-object interactions with transfer to question answering
AU - Mallya, Arun
AU - Lazebnik, Svetlana
N1 - Publisher Copyright:
© Springer International Publishing AG 2016.
PY - 2016
Y1 - 2016
N2 - This paper proposes deep convolutional network models that utilize local and global context to make human activity label predictions in still images, achieving state-of-the-art performance on two recent datasets with hundreds of labels each. We use multiple instance learning to handle the lack of supervision on the level of individual person instances, and weighted loss to handle unbalanced training data. Further, we show how specialized features trained on these datasets can be used to improve accuracy on the Visual Question Answering (VQA) task, in the form of multiple choice fill-in-the-blank questions (Visual Madlibs). Specifically, we tackle two types of questions on person activity and person-object relationship and show improvements over generic features trained on the ImageNet classification task.
AB - This paper proposes deep convolutional network models that utilize local and global context to make human activity label predictions in still images, achieving state-of-the-art performance on two recent datasets with hundreds of labels each. We use multiple instance learning to handle the lack of supervision on the level of individual person instances, and weighted loss to handle unbalanced training data. Further, we show how specialized features trained on these datasets can be used to improve accuracy on the Visual Question Answering (VQA) task, in the form of multiple choice fill-in-the-blank questions (Visual Madlibs). Specifically, we tackle two types of questions on person activity and person-object relationship and show improvements over generic features trained on the ImageNet classification task.
KW - Activity prediction
KW - Deep networks
KW - Visual question answering
UR - http://www.scopus.com/inward/record.url?scp=84990051104&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84990051104&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-46448-0_25
DO - 10.1007/978-3-319-46448-0_25
M3 - Conference contribution
AN - SCOPUS:84990051104
SN - 9783319464473
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 414
EP - 428
BT - Computer Vision - 14th European Conference, ECCV 2016, Proceedings
A2 - Leibe, Bastian
A2 - Matas, Jiri
A2 - Sebe, Nicu
A2 - Welling, Max
PB - Springer
T2 - 14th European Conference on Computer Vision, ECCV 2016
Y2 - 11 October 2016 through 14 October 2016
ER -