TY - GEN
T1 - DeepMaven
T2 - 17th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2023
AU - Fung, Yi
AU - Wang, Han
AU - Wang, Tong
AU - Kebarighotbi, Ali
AU - Bansal, Mohit
AU - Ji, Heng
AU - Natarajan, Prem
N1 - Publisher Copyright:
© 2023 Association for Computational Linguistics.
PY - 2023
Y1 - 2023
N2 - Long video content understanding poses a challenging set of research questions as it involves long-distance, cross-media reasoning and knowledge awareness. In this paper, we present a new benchmark for this problem domain, targeting the task of deep movie/TV question answering (QA) beyond previous work's focus on simple plot summary and short video moment settings. We define several baselines based on direct retrieval of relevant context for long-distance movie QA. Observing that real-world QAs may require higher-order multi-hop inferences, we further propose a novel framework, called the DEEPMAVEN, which extracts events, entities, and relations from the rich multimedia content in long videos to preconstruct movie knowledge graphs (movieKGs), and at the time of QA inference, complements general semantics with structured knowledge for more effective information retrieval and knowledge reasoning. We also introduce our recently collected DeepMovieQA dataset, including 1,000 long-form QA pairs from 41 hours of videos, to serve as a new and useful resource for future work. Empirical results show the DeepMaven performs competitively for both the new DeepMovieQA and the pre-existing MovieQA dataset.
AB - Long video content understanding poses a challenging set of research questions as it involves long-distance, cross-media reasoning and knowledge awareness. In this paper, we present a new benchmark for this problem domain, targeting the task of deep movie/TV question answering (QA) beyond previous work's focus on simple plot summary and short video moment settings. We define several baselines based on direct retrieval of relevant context for long-distance movie QA. Observing that real-world QAs may require higher-order multi-hop inferences, we further propose a novel framework, called the DEEPMAVEN, which extracts events, entities, and relations from the rich multimedia content in long videos to preconstruct movie knowledge graphs (movieKGs), and at the time of QA inference, complements general semantics with structured knowledge for more effective information retrieval and knowledge reasoning. We also introduce our recently collected DeepMovieQA dataset, including 1,000 long-form QA pairs from 41 hours of videos, to serve as a new and useful resource for future work. Empirical results show the DeepMaven performs competitively for both the new DeepMovieQA and the pre-existing MovieQA dataset.
UR - http://www.scopus.com/inward/record.url?scp=85159851183&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85159851183&partnerID=8YFLogxK
U2 - 10.18653/v1/2023.eacl-main.221
DO - 10.18653/v1/2023.eacl-main.221
M3 - Conference contribution
AN - SCOPUS:85159851183
T3 - EACL 2023 - 17th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference
SP - 3033
EP - 3043
BT - EACL 2023 - 17th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference
PB - Association for Computational Linguistics (ACL)
Y2 - 2 May 2023 through 6 May 2023
ER -