Ordered Attention for Coherent Visual Storytelling

Tom Braude, Idan Schwartz, Alex Schwing, Ariel Shamir

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We address the problem of visual storytelling, i.e., generating a story for a given sequence of images. While each story sentence should describe a corresponding image, a coherent story also needs to be consistent and relate to both future and past images. Current approaches encode images independently, disregarding relations between images. Our approach learns to encode images with different interactions based on the story position (i.e., past image or future image). To this end, we develop a novel message-passing-like algorithm for ordered image attention (OIA) that collects interactions across all the images in the sequence. Finally, to generate the story's sentences, a second attention mechanism picks the important image attention vectors with an Image-Sentence Attention (ISA). The obtained results improve the METEOR score on the VIST dataset by 1%. Furthermore, a thorough human study confirms improvements and demonstrates that order-based interactions significantly improve coherency (64.20% \vs 28.70%). Source code available at \urlhttps://github.com/tomateb/OIAVist.git

Original languageEnglish (US)
Title of host publicationMM 2022 - Proceedings of the 30th ACM International Conference on Multimedia
PublisherAssociation for Computing Machinery
Pages3310-3318
Number of pages9
ISBN (Electronic)9781450392037
DOIs
StatePublished - Oct 10 2022
Event30th ACM International Conference on Multimedia, MM 2022 - Lisboa, Portugal
Duration: Oct 10 2022Oct 14 2022

Publication series

NameMM 2022 - Proceedings of the 30th ACM International Conference on Multimedia

Conference

Conference30th ACM International Conference on Multimedia, MM 2022
Country/TerritoryPortugal
CityLisboa
Period10/10/2210/14/22

Keywords

  • ordered attention
  • visual grounding
  • visual storytelling

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Graphics and Computer-Aided Design
  • Human-Computer Interaction
  • Software

Fingerprint

Dive into the research topics of 'Ordered Attention for Coherent Visual Storytelling'. Together they form a unique fingerprint.

Cite this