MuMuQA: Multimedia Multi-Hop News Question Answering via Cross-Media Knowledge Extraction and Grounding

Revanth Gangi Reddy, Xilin Rui, Manling Li, Xudong Lin, Haoyang Wen, Jaemin Cho, Lifu Huang, Mohit Bansal, Avirup Sil, Shih Fu Chang, Alexander Schwing, Heng Ji

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Recently, there has been an increasing interest in building question answering (QA) models that reason across multiple modalities, such as text and images. However, QA using images is often limited to just picking the answer from a predefined set of options. In addition, images in the real world, especially in news, have objects that are co-referential to the text, with complementary information from both modalities. In this paper, we present a new QA evaluation benchmark with 1,384 questions over news articles that require crossmedia grounding of objects in images onto text. Specifically, the task involves multi-hop questions that require reasoning over image-caption pairs to identify the grounded visual object being referred to and then predicting a span from the news body text to answer the question. In addition, we introduce a novel multimedia data augmentation framework, based on cross-media knowledge extraction and synthetic question-answer generation, to automatically augment data that can provide weak supervision for this task. We evaluate both pipeline-based and end-to-end pretraining-based multimedia QA models on our benchmark, and show that they achieve promising performance, while considerably lagging behind human performance hence leaving large room for future work on this challenging new task.

Original languageEnglish (US)
Title of host publicationAAAI-22 Technical Tracks 10
PublisherAssociation for the Advancement of Artificial Intelligence
Pages11200-11208
Number of pages9
ISBN (Electronic)1577358767, 9781577358763
StatePublished - Jun 30 2022
Event36th AAAI Conference on Artificial Intelligence, AAAI 2022 - Virtual, Online
Duration: Feb 22 2022Mar 1 2022

Publication series

NameProceedings of the 36th AAAI Conference on Artificial Intelligence, AAAI 2022
Volume36

Conference

Conference36th AAAI Conference on Artificial Intelligence, AAAI 2022
CityVirtual, Online
Period2/22/223/1/22

ASJC Scopus subject areas

  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'MuMuQA: Multimedia Multi-Hop News Question Answering via Cross-Media Knowledge Extraction and Grounding'. Together they form a unique fingerprint.

Cite this