TY - GEN
T1 - Ordered Attention for Coherent Visual Storytelling
AU - Braude, Tom
AU - Schwartz, Idan
AU - Schwing, Alex
AU - Shamir, Ariel
N1 - Publisher Copyright:
© 2022 ACM.
PY - 2022/10/10
Y1 - 2022/10/10
N2 - We address the problem of visual storytelling, i.e., generating a story for a given sequence of images. While each story sentence should describe a corresponding image, a coherent story also needs to be consistent and relate to both future and past images. Current approaches encode images independently, disregarding relations between images. Our approach learns to encode images with different interactions based on the story position (i.e., past image or future image). To this end, we develop a novel message-passing-like algorithm for ordered image attention (OIA) that collects interactions across all the images in the sequence. Finally, to generate the story's sentences, a second attention mechanism picks the important image attention vectors with an Image-Sentence Attention (ISA). The obtained results improve the METEOR score on the VIST dataset by 1%. Furthermore, a thorough human study confirms improvements and demonstrates that order-based interactions significantly improve coherency (64.20% \vs 28.70%). Source code available at \urlhttps://github.com/tomateb/OIAVist.git
AB - We address the problem of visual storytelling, i.e., generating a story for a given sequence of images. While each story sentence should describe a corresponding image, a coherent story also needs to be consistent and relate to both future and past images. Current approaches encode images independently, disregarding relations between images. Our approach learns to encode images with different interactions based on the story position (i.e., past image or future image). To this end, we develop a novel message-passing-like algorithm for ordered image attention (OIA) that collects interactions across all the images in the sequence. Finally, to generate the story's sentences, a second attention mechanism picks the important image attention vectors with an Image-Sentence Attention (ISA). The obtained results improve the METEOR score on the VIST dataset by 1%. Furthermore, a thorough human study confirms improvements and demonstrates that order-based interactions significantly improve coherency (64.20% \vs 28.70%). Source code available at \urlhttps://github.com/tomateb/OIAVist.git
KW - ordered attention
KW - visual grounding
KW - visual storytelling
UR - http://www.scopus.com/inward/record.url?scp=85140715533&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85140715533&partnerID=8YFLogxK
U2 - 10.1145/3503161.3548161
DO - 10.1145/3503161.3548161
M3 - Conference contribution
AN - SCOPUS:85140715533
T3 - MM 2022 - Proceedings of the 30th ACM International Conference on Multimedia
SP - 3310
EP - 3318
BT - MM 2022 - Proceedings of the 30th ACM International Conference on Multimedia
PB - Association for Computing Machinery
T2 - 30th ACM International Conference on Multimedia, MM 2022
Y2 - 10 October 2022 through 14 October 2022
ER -