TY - GEN
T1 - TEXTEE
T2 - Findings of the 62nd Annual Meeting of the Association for Computational Linguistics, ACL 2024
AU - Huang, Kuan Hao
AU - Hsu, I. Hung
AU - Parekh, Tanmay
AU - Xie, Zhiyu
AU - Zhang, Zixuan
AU - Natarajan, Premkumar
AU - Chang, Kai Wei
AU - Peng, Nanyun
AU - Ji, Heng
N1 - Publisher Copyright:
© 2024 Association for Computational Linguistics.
PY - 2024
Y1 - 2024
N2 - Event extraction has gained considerable interest due to its wide-ranging applications. However, recent studies draw attention to evaluation issues, suggesting that reported scores may not accurately reflect the true performance. In this work, we identify and address evaluation challenges, including inconsistency due to varying data assumptions or preprocessing steps, the insufficiency of current evaluation frameworks that may introduce dataset or data split bias, and the low reproducibility of some previous approaches. To address these challenges, we present TEXTEE, a standardized, fair, and reproducible benchmark for event extraction. TEXTEE comprises standardized data preprocessing scripts and splits for 16 datasets spanning eight diverse domains and includes 14 recent methodologies, conducting a comprehensive benchmark reevaluation. We also evaluate five varied large language models on our TEXTEE benchmark and demonstrate how they struggle to achieve satisfactory performance. Inspired by our reevaluation results and findings, we discuss the role of event extraction in the current NLP era, as well as future challenges and insights derived from TEXTEE. We believe TEXTEE, the first standardized comprehensive benchmarking tool, will significantly facilitate future event extraction research.
AB - Event extraction has gained considerable interest due to its wide-ranging applications. However, recent studies draw attention to evaluation issues, suggesting that reported scores may not accurately reflect the true performance. In this work, we identify and address evaluation challenges, including inconsistency due to varying data assumptions or preprocessing steps, the insufficiency of current evaluation frameworks that may introduce dataset or data split bias, and the low reproducibility of some previous approaches. To address these challenges, we present TEXTEE, a standardized, fair, and reproducible benchmark for event extraction. TEXTEE comprises standardized data preprocessing scripts and splits for 16 datasets spanning eight diverse domains and includes 14 recent methodologies, conducting a comprehensive benchmark reevaluation. We also evaluate five varied large language models on our TEXTEE benchmark and demonstrate how they struggle to achieve satisfactory performance. Inspired by our reevaluation results and findings, we discuss the role of event extraction in the current NLP era, as well as future challenges and insights derived from TEXTEE. We believe TEXTEE, the first standardized comprehensive benchmarking tool, will significantly facilitate future event extraction research.
UR - https://www.scopus.com/pages/publications/85205316194
UR - https://www.scopus.com/pages/publications/85205316194#tab=citedBy
U2 - 10.18653/v1/2024.findings-acl.760
DO - 10.18653/v1/2024.findings-acl.760
M3 - Conference contribution
AN - SCOPUS:85205316194
T3 - Proceedings of the Annual Meeting of the Association for Computational Linguistics
SP - 12804
EP - 12825
BT - The 62nd Annual Meeting of the Association for Computational Linguistics
A2 - Ku, Lun-Wei
A2 - Martins, Andre
A2 - Srikumar, Vivek
PB - Association for Computational Linguistics (ACL)
Y2 - 11 August 2024 through 16 August 2024
ER -