Shorten the Long Tail for Rare Entity and Event Extraction

Pengfei Yu, Heng Ji

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The distribution of knowledge elements such as entity types and event types is long-tailed in natural language. Hence information extraction datasets naturally conform a long-tailed distribution. Although imbalanced datasets can teach the model about the useful real-world bias, deep learning models may learn features not generalizable to rare or unseen mentions of entities or events during evaluation, especially for rare types without sufficient training instances. Existing approaches for the long-tailed learning problem seek to manipulate the training data by re-balancing, augmentation or introducing extra prior knowledge. In comparison, we propose to handle the generalization challenge by making the evaluation instances closer to the frequent training cases. We design a new transformation module that transforms infrequent candidate mention representation during evaluation with the average mention representation in the training dataset. Experimental results on classic benchmarks on three entity or event extraction datasets demonstrate the effectiveness of our framework.

Original languageEnglish (US)
Title of host publicationEACL 2023 - 17th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference
PublisherAssociation for Computational Linguistics (ACL)
Pages1331-1342
Number of pages12
ISBN (Electronic)9781959429449
StatePublished - 2023
Event17th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2023 - Dubrovnik, Croatia
Duration: May 2 2023May 6 2023

Publication series

NameEACL 2023 - 17th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference

Conference

Conference17th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2023
Country/TerritoryCroatia
CityDubrovnik
Period5/2/235/6/23

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Software
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Shorten the Long Tail for Rare Entity and Event Extraction'. Together they form a unique fingerprint.

Cite this