A Study of Distributed Representations for Figures of Research Articles

Saar Kuzi, Cheng Xiang Zhai

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Figures of research articles are entities that can be directly used in many application systems to assist researchers, making the representation of figures a problem worth studying. In this paper, we study the effectiveness of distributed representations, learned using deep neural networks, for figures. We learn representations using both text and image data and compare different model architectures and loss functions for the task. Furthermore, to overcome the lack of training data for the task, we propose and study a novel weak supervision approach for learning embedding vectors and show that it is more effective than using some of the pre-trained neural models as suggested by recent works. Experimental results using figures from the ACL Anthology show that distributed representations for research figures can be more effective than the previously studied bag-of-words representations. Yet, combining the two approaches can further improve performance. Finally, the results also show that these representations, while effective in general, can be sensitive to the learning approach used and that using both image data and text and a simple model architecture is the most effective approach.

Original languageEnglish (US)
Title of host publicationAdvances in Information Retrieval - 43rd European Conference on IR Research, ECIR 2021, Proceedings
EditorsDjoerd Hiemstra, Marie-Francine Moens, Josiane Mothe, Raffaele Perego, Martin Potthast, Fabrizio Sebastiani
PublisherSpringer
Pages284-297
Number of pages14
ISBN (Print)9783030721121
DOIs
StatePublished - 2021
Event43rd European Conference on Information Retrieval Research, ECIR 2021 - Virtual, Online
Duration: Mar 28 2021Apr 1 2021

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12656 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference43rd European Conference on Information Retrieval Research, ECIR 2021
CityVirtual, Online
Period3/28/214/1/21

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'A Study of Distributed Representations for Figures of Research Articles'. Together they form a unique fingerprint.

Cite this