AnaDE1.0: A Novel Data Set for Benchmarking Analogy Detection and Extraction

Bhavya Bhavya, Shradha Sehgal, Jinjun Xiong, Cheng Xiang Zhai

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Textual analogies that make comparisons between two concepts are often used for explaining complex ideas, creative writing, and scientific discovery. In this paper, we propose and study a new task, called Analogy Detection and Extraction (AnaDE), which includes three synergistic sub-tasks: 1) detecting documents containing analogies, 2) extracting text segments that make up the analogy, and 3) identifying the source and target concepts being compared. To facilitate the study of this new task, we create a benchmark dataset by scraping Metamia.com and investigate the performances of state-of-the-art models on all sub-tasks to establish the first-generation benchmark results for this new task. We find that the Longformer model achieves the best performance on all three sub-tasks demonstrating its effectiveness for handling long texts. Moreover, smaller models fine-tuned on our dataset perform better than non-fine-tuned ChatGPT, suggesting high task difficulty. Overall, the models achieve a high performance on document detection suggesting that it could be used to develop applications like analogy search engines. Further, there is a large room for improvement on the segment and concept extraction tasks.

Original languageEnglish (US)
Title of host publicationEACL 2024 - 18th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference
EditorsYvette Graham, Matthew Purver, Matthew Purver
PublisherAssociation for Computational Linguistics (ACL)
Pages1723-1737
Number of pages15
ISBN (Electronic)9798891760882
StatePublished - 2024
Event18th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2024 - St. Julian�s, Malta
Duration: Mar 17 2024Mar 22 2024

Publication series

NameEACL 2024 - 18th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference
Volume1

Conference

Conference18th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2024
Country/TerritoryMalta
CitySt. Julian�s
Period3/17/243/22/24

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'AnaDE1.0: A Novel Data Set for Benchmarking Analogy Detection and Extraction'. Together they form a unique fingerprint.

Cite this