TY - GEN
T1 - Interpretable Visual Reasoning via Induced Symbolic Space
AU - Wang, Zhonghao
AU - Wang, Kai
AU - Yu, Mo
AU - Xiong, Jinjun
AU - Hwu, Wen Mei
AU - Hasegawa-Johnson, Mark
AU - Shi, Humphrey
N1 - Funding Information:
Acknowledgments This work is in part supported by IBM-Illinois Center for Cognitive Computing Systems Research (C3SR) - a research collaboration as part of the IBM AI Horizons Network.
Publisher Copyright:
© 2021 IEEE
PY - 2021
Y1 - 2021
N2 - We study the problem of concept induction in visual reasoning, i.e., identifying concepts and their hierarchical relationships from question-answer pairs associated with images; and achieve an interpretable model via working on the induced symbolic concept space. To this end, we first design a new framework named object-centric compositional attention model (OCCAM) to perform the visual reasoning task with object-level visual features. Then, we come up with a method to induce concepts of objects and relations using clues from the attention patterns between objects' visual features and question words. Finally, we achieve a higher level of interpretability by imposing OCCAM on the objects represented in the induced symbolic concept space. Experiments on the CLEVR and GQA datasets demonstrate: 1) our OCCAM achieves a new state of the art without human-annotated functional programs; 2) our induced concepts are both accurate and sufficient as OCCAM achieves an on-par performance on objects represented either in visual features or in the induced symbolic concept space.
AB - We study the problem of concept induction in visual reasoning, i.e., identifying concepts and their hierarchical relationships from question-answer pairs associated with images; and achieve an interpretable model via working on the induced symbolic concept space. To this end, we first design a new framework named object-centric compositional attention model (OCCAM) to perform the visual reasoning task with object-level visual features. Then, we come up with a method to induce concepts of objects and relations using clues from the attention patterns between objects' visual features and question words. Finally, we achieve a higher level of interpretability by imposing OCCAM on the objects represented in the induced symbolic concept space. Experiments on the CLEVR and GQA datasets demonstrate: 1) our OCCAM achieves a new state of the art without human-annotated functional programs; 2) our induced concepts are both accurate and sufficient as OCCAM achieves an on-par performance on objects represented either in visual features or in the induced symbolic concept space.
UR - http://www.scopus.com/inward/record.url?scp=85117747838&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85117747838&partnerID=8YFLogxK
U2 - 10.1109/ICCV48922.2021.00189
DO - 10.1109/ICCV48922.2021.00189
M3 - Conference contribution
AN - SCOPUS:85117747838
T3 - Proceedings of the IEEE International Conference on Computer Vision
SP - 1858
EP - 1867
BT - Proceedings - 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 18th IEEE/CVF International Conference on Computer Vision, ICCV 2021
Y2 - 11 October 2021 through 17 October 2021
ER -