TY - GEN
T1 - Improving question answering with external knowledge
AU - Pan, Xiaoman
AU - Sun, Kai
AU - Yu, Dian
AU - Chen, Jianshu
AU - Ji, Heng
AU - Cardie, Claire
AU - Yu, Dong
N1 - Publisher Copyright:
© 2019 MRQA@EMNLP 2019 - Proceedings of the 2nd Workshop on Machine Reading for Question Answering. All rights reserved.
PY - 2019
Y1 - 2019
N2 - We focus on multiple-choice question answering (QA) tasks in subject areas such as science, where we require both broad background knowledge and the facts from the given subject-area reference corpus. In this work, we explore simple yet effective methods for exploiting two sources of external knowledge for subject-area QA. The first enriches the original subject-area reference corpus with relevant text snippets extracted from an open-domain resource (i.e., Wikipedia) that cover potentially ambiguous concepts in the question and answer options. As in other QA research, the second method simply increases the amount of training data by appending additional indomain subject-area instances. Experiments on three challenging multiplechoice science QA tasks (i.e., ARC-Easy, ARC-Challenge, and OpenBookQA) demonstrate the effectiveness of our methods: In comparison to the previous state-of-the-art, we obtain absolute gains in accuracy of up to 8:1%, 13:0%, and 12:8%, respectively. While we observe consistent gains when we introduce knowledge from Wikipedia, we find that employing additional QA training instances is not uniformly helpful: Performance degrades when the added instances exhibit a higher level of difficulty than the original training data. As one of the first studies on exploiting unstructured external knowledge for subject-area QA, we hope our methods, observations, and discussion of the exposed limitations may shed light on further developments in the area.
AB - We focus on multiple-choice question answering (QA) tasks in subject areas such as science, where we require both broad background knowledge and the facts from the given subject-area reference corpus. In this work, we explore simple yet effective methods for exploiting two sources of external knowledge for subject-area QA. The first enriches the original subject-area reference corpus with relevant text snippets extracted from an open-domain resource (i.e., Wikipedia) that cover potentially ambiguous concepts in the question and answer options. As in other QA research, the second method simply increases the amount of training data by appending additional indomain subject-area instances. Experiments on three challenging multiplechoice science QA tasks (i.e., ARC-Easy, ARC-Challenge, and OpenBookQA) demonstrate the effectiveness of our methods: In comparison to the previous state-of-the-art, we obtain absolute gains in accuracy of up to 8:1%, 13:0%, and 12:8%, respectively. While we observe consistent gains when we introduce knowledge from Wikipedia, we find that employing additional QA training instances is not uniformly helpful: Performance degrades when the added instances exhibit a higher level of difficulty than the original training data. As one of the first studies on exploiting unstructured external knowledge for subject-area QA, we hope our methods, observations, and discussion of the exposed limitations may shed light on further developments in the area.
UR - https://www.scopus.com/pages/publications/85095004064
UR - https://www.scopus.com/pages/publications/85095004064#tab=citedBy
U2 - 10.18653/v1/D19-5804
DO - 10.18653/v1/D19-5804
M3 - Conference contribution
AN - SCOPUS:85095004064
T3 - MRQA@EMNLP 2019 - Proceedings of the 2nd Workshop on Machine Reading for Question Answering
SP - 27
EP - 37
BT - MRQA@EMNLP 2019 - Proceedings of the 2nd Workshop on Machine Reading for Question Answering
PB - Association for Computational Linguistics (ACL)
T2 - 2nd Workshop on Machine Reading for Question Answering, MRQA@EMNLP 2019
Y2 - 4 November 2019
ER -