Straight to the facts: Learning knowledge base retrieval for factual visual question answering

Medhini Narasimhan, Alexander G. Schwing

Research output: Chapter in Book/Report/Conference proceedingConference contribution


Question answering is an important task for autonomous agents and virtual assistants alike and was shown to support the disabled in efficiently navigating an overwhelming environment. Many existing methods focus on observation-based questions, ignoring our ability to seamlessly combine observed content with general knowledge. To understand interactions with a knowledge base, a dataset has been introduced recently and keyword matching techniques were shown to yield compelling results despite being vulnerable to misconceptions due to synonyms and homographs. To address this issue, we develop a learning-based approach which goes straight to the facts via a learned embedding space. We demonstrate state-of-the-art results on the challenging recently introduced fact-based visual question answering dataset, outperforming competing methods by more than 5%.

Original languageEnglish (US)
Title of host publicationComputer Vision – ECCV 2018 - 15th European Conference, 2018, Proceedings
EditorsVittorio Ferrari, Cristian Sminchisescu, Yair Weiss, Martial Hebert
Number of pages18
ISBN (Print)9783030012366
StatePublished - 2018
Event15th European Conference on Computer Vision, ECCV 2018 - Munich, Germany
Duration: Sep 8 2018Sep 14 2018

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11212 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


Other15th European Conference on Computer Vision, ECCV 2018


  • Fact based visual question answering
  • Knowledge bases

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)


Dive into the research topics of 'Straight to the facts: Learning knowledge base retrieval for factual visual question answering'. Together they form a unique fingerprint.

Cite this