Improving event extraction via multimodal integration

Tongtao Zhang, Spencer Whitehead, Hanwang Zhang, Hongzhi Li, Joseph Ellis, Lifu Huang, Wei Liu, Heng Ji, Shih Fu Chang

Research output: Chapter in Book/Report/Conference proceedingConference contribution


In this paper, we focus on improving Event Extraction (EE) by incorporating visual knowledge with words and phrases from text documents. We first discover visual patterns from large-scale textimage pairs in a weakly-supervised manner and then propose a multimodal event extraction algorithm where the event extractor is jointly trained with textual features and visual patterns. Extensive experimental results on benchmark data sets demonstrate that the proposed multimodal EE method can achieve significantly better performance on event extraction: absolute 7.1% F-score gain on event trigger labeling and 8.5% F-score gain on event argument labeling.

Original languageEnglish (US)
Title of host publicationMM 2017 - Proceedings of the 2017 ACM Multimedia Conference
PublisherAssociation for Computing Machinery
Number of pages9
ISBN (Electronic)9781450349062
StatePublished - Oct 23 2017
Externally publishedYes
Event25th ACM International Conference on Multimedia, MM 2017 - Mountain View, United States
Duration: Oct 23 2017Oct 27 2017

Publication series

NameMM 2017 - Proceedings of the 2017 ACM Multimedia Conference


Other25th ACM International Conference on Multimedia, MM 2017
Country/TerritoryUnited States
CityMountain View


  • Event extraction
  • Multimodal approach
  • Natural language processing
  • Visual pattern discovery

ASJC Scopus subject areas

  • Computer Graphics and Computer-Aided Design
  • Media Technology
  • Computer Vision and Pattern Recognition
  • Software


Dive into the research topics of 'Improving event extraction via multimodal integration'. Together they form a unique fingerprint.

Cite this