Video Event Extraction via Tracking Visual States of Arguments

Guang Yang, Manling Li, Jiajie Zhang, Xudong Lin, Heng Ji, Shih Fu Chang

Research output: Chapter in Book/Report/Conference proceedingConference contribution


Video event extraction aims to detect salient events from a video and identify the arguments for each event as well as their semantic roles. Existing methods focus on capturing the overall visual scene of each frame, ignoring finegrained argument-level information. Inspired by the definition of events as changes of states, we propose a novel framework to detect video events by tracking the changes in the visual states of all involved arguments, which are expected to provide the most informative evidence for the extraction of video events. In order to capture the visual state changes of arguments, we decompose them into changes in pixels within objects, displacements of objects, and interactions among multiple arguments. We further propose Object State Embedding, Object Motion-aware Embedding and Argument Interaction Embedding to encode and track these changes respectively. Experiments on various video event extraction tasks demonstrate significant improvements compared to state-of-the-art models. In particular, on verb classification, we achieve 3.49% absolute gains (19.53% relative gains) in F1@5 on Video Situation Recognition. Our Code is publicly available at for research purposes.

Original languageEnglish (US)
Title of host publicationAAAI-23 Technical Tracks 3
EditorsBrian Williams, Yiling Chen, Jennifer Neville
PublisherAmerican Association for Artificial Intelligence (AAAI) Press
Number of pages9
ISBN (Electronic)9781577358800
StatePublished - Jun 27 2023
Event37th AAAI Conference on Artificial Intelligence, AAAI 2023 - Washington, United States
Duration: Feb 7 2023Feb 14 2023

Publication series

NameProceedings of the 37th AAAI Conference on Artificial Intelligence, AAAI 2023


Conference37th AAAI Conference on Artificial Intelligence, AAAI 2023
Country/TerritoryUnited States

ASJC Scopus subject areas

  • Artificial Intelligence


Dive into the research topics of 'Video Event Extraction via Tracking Visual States of Arguments'. Together they form a unique fingerprint.

Cite this