TY - GEN
T1 - Instruct and Extract
T2 - 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023
AU - Jiao, Yizhu
AU - Zhong, Ming
AU - Li, Sha
AU - Zhao, Ruining
AU - Ouyang, Siru
AU - Ji, Heng
AU - Han, Jiawei
N1 - Publisher Copyright:
© 2023 Association for Computational Linguistics.
PY - 2023
Y1 - 2023
N2 - Large language models with instruction-following capabilities open the door to a wider group of users. However, when it comes to information extraction - a classic task in natural language processing - most task-specific systems cannot align well with long-tail ad hoc extraction use cases for non-expert users. To address this, we propose a novel paradigm, termed On-Demand Information Extraction, to fulfill the personalized demands of real-world users. Our task aims to follow the instructions to extract the desired content from the associated text and present it in a structured tabular format. The table headers can either be user-specified or inferred contextually by the model. To facilitate research in this emerging area, we present a benchmark named INSTRUCTIE, inclusive of both automatically generated training data, as well as the human-annotated test set. Building on INSTRUCTIE, we further develop an On-Demand Information Extractor, ODIE. Comprehensive evaluations on our benchmark reveal that ODIE substantially outperforms the existing open-source models of similar size. Our code and dataset are released on https://github.com/yzjiao/On-Demand-IE.
AB - Large language models with instruction-following capabilities open the door to a wider group of users. However, when it comes to information extraction - a classic task in natural language processing - most task-specific systems cannot align well with long-tail ad hoc extraction use cases for non-expert users. To address this, we propose a novel paradigm, termed On-Demand Information Extraction, to fulfill the personalized demands of real-world users. Our task aims to follow the instructions to extract the desired content from the associated text and present it in a structured tabular format. The table headers can either be user-specified or inferred contextually by the model. To facilitate research in this emerging area, we present a benchmark named INSTRUCTIE, inclusive of both automatically generated training data, as well as the human-annotated test set. Building on INSTRUCTIE, we further develop an On-Demand Information Extractor, ODIE. Comprehensive evaluations on our benchmark reveal that ODIE substantially outperforms the existing open-source models of similar size. Our code and dataset are released on https://github.com/yzjiao/On-Demand-IE.
UR - http://www.scopus.com/inward/record.url?scp=85184818470&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85184818470&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85184818470
T3 - EMNLP 2023 - 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings
SP - 10030
EP - 10051
BT - EMNLP 2023 - 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings
A2 - Bouamor, Houda
A2 - Pino, Juan
A2 - Bali, Kalika
PB - Association for Computational Linguistics (ACL)
Y2 - 6 December 2023 through 10 December 2023
ER -