TY - JOUR
T1 - High-order attention models for visual question answering
AU - Schwartz, Idan
AU - Schwing, Alexander G.
AU - Hazan, Tamir
N1 - Acknowledgments: This research was supported in part by The Israel Science Foundation (grant No. 948/15). This material is based upon work supported in part by the National Science Foundation under Grant No. 1718221. We thank Nvidia for providing GPUs used in this research.
PY - 2017
Y1 - 2017
N2 - The quest for algorithms that enable cognitive abilities is an important part of machine learning. A common trait in many recently investigated cognitive-like tasks is that they take into account different data modalities, such as visual and textual input. In this paper we propose a novel and generally applicable form of attention mechanism that learns high-order correlations between various data modalities. We show that high-order correlations effectively direct the appropriate attention to the relevant elements in the different data modalities that are required to solve the joint task. We demonstrate the effectiveness of our high-order attention mechanism on the task of visual question answering (VQA), where we achieve state-of-the-art performance on the standard VQA dataset.
AB - The quest for algorithms that enable cognitive abilities is an important part of machine learning. A common trait in many recently investigated cognitive-like tasks is that they take into account different data modalities, such as visual and textual input. In this paper we propose a novel and generally applicable form of attention mechanism that learns high-order correlations between various data modalities. We show that high-order correlations effectively direct the appropriate attention to the relevant elements in the different data modalities that are required to solve the joint task. We demonstrate the effectiveness of our high-order attention mechanism on the task of visual question answering (VQA), where we achieve state-of-the-art performance on the standard VQA dataset.
UR - http://www.scopus.com/inward/record.url?scp=85047015057&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85047015057&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85047015057
SN - 1049-5258
VL - 2017-December
SP - 3665
EP - 3675
JO - Advances in Neural Information Processing Systems
JF - Advances in Neural Information Processing Systems
T2 - 31st Annual Conference on Neural Information Processing Systems, NIPS 2017
Y2 - 4 December 2017 through 9 December 2017
ER -