TY - GEN
T1 - A Study of Distributed Representations for Figures of Research Articles
AU - Kuzi, Saar
AU - Zhai, Cheng Xiang
N1 - Publisher Copyright:
© 2021, Springer Nature Switzerland AG.
PY - 2021
Y1 - 2021
N2 - Figures of research articles are entities that can be directly used in many application systems to assist researchers, making the representation of figures a problem worth studying. In this paper, we study the effectiveness of distributed representations, learned using deep neural networks, for figures. We learn representations using both text and image data and compare different model architectures and loss functions for the task. Furthermore, to overcome the lack of training data for the task, we propose and study a novel weak supervision approach for learning embedding vectors and show that it is more effective than using some of the pre-trained neural models as suggested by recent works. Experimental results using figures from the ACL Anthology show that distributed representations for research figures can be more effective than the previously studied bag-of-words representations. Yet, combining the two approaches can further improve performance. Finally, the results also show that these representations, while effective in general, can be sensitive to the learning approach used and that using both image data and text and a simple model architecture is the most effective approach.
AB - Figures of research articles are entities that can be directly used in many application systems to assist researchers, making the representation of figures a problem worth studying. In this paper, we study the effectiveness of distributed representations, learned using deep neural networks, for figures. We learn representations using both text and image data and compare different model architectures and loss functions for the task. Furthermore, to overcome the lack of training data for the task, we propose and study a novel weak supervision approach for learning embedding vectors and show that it is more effective than using some of the pre-trained neural models as suggested by recent works. Experimental results using figures from the ACL Anthology show that distributed representations for research figures can be more effective than the previously studied bag-of-words representations. Yet, combining the two approaches can further improve performance. Finally, the results also show that these representations, while effective in general, can be sensitive to the learning approach used and that using both image data and text and a simple model architecture is the most effective approach.
UR - http://www.scopus.com/inward/record.url?scp=85107360907&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85107360907&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-72113-8_19
DO - 10.1007/978-3-030-72113-8_19
M3 - Conference contribution
AN - SCOPUS:85107360907
SN - 9783030721121
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 284
EP - 297
BT - Advances in Information Retrieval - 43rd European Conference on IR Research, ECIR 2021, Proceedings
A2 - Hiemstra, Djoerd
A2 - Moens, Marie-Francine
A2 - Mothe, Josiane
A2 - Perego, Raffaele
A2 - Potthast, Martin
A2 - Sebastiani, Fabrizio
PB - Springer
T2 - 43rd European Conference on Information Retrieval Research, ECIR 2021
Y2 - 28 March 2021 through 1 April 2021
ER -