TY - GEN
T1 - Indirect supervision for relation extraction using question-answer pairs
AU - Wu, Zeqiu
AU - Ren, Xiang
AU - Xu, Frank F.
AU - Li, Ji
AU - Han, Jiawei
N1 - Publisher Copyright:
© 2018 Association for Computing Machinery.
PY - 2018/2/2
Y1 - 2018/2/2
N2 - Automatic relation extraction (RE) for types of interest is of great importance for interpreting massive text corpora in an efficient manner. For example,we want to identify the relationship "president-of" between entities "Donald Trump" and "United States" in a sentence expressing such a relation. Traditional RE models have heavily relied on human-annotated corpus for training, which can be costly in generating labeled data and become obstacles when dealing with more relation types. Thus, more RE extraction systems have shifted to be built upon training data automatically acquired by linking to knowledge bases (distant supervision). However, due to the incompleteness of knowledge bases and the context-agnostic labeling, the training data collected via distant supervision (DS) can be very noisy. In recent years, as increasing attention has been brought to tackling question-answering (QA) tasks, user feedback or datasets of such tasks become more accessible. In this paper, we propose a novel framework, ReQuest, to leverage question-answer pairs as an indirect source of supervision for relation extraction, and study how to use such supervision to reduce noise induced from DS. Our model jointly embeds relation mentions, types, QA entity mention pairs and text features in two low-dimensional spaces (RE and QA), where objects with same relation types or semantically similar question-answer pairs have similar representations. Shared features connect these two spaces, carrying clearer semantic knowledge from both sources. ReQuest, then use these learned embeddings to estimate the types of test relation mentions. We formulate a global objective function and adopt a novel margin-based QA loss to reduce noise in DS by exploiting semantic evidence from the QA dataset. Our experimental results achieve an average of 11% improvement in F1 score on two public RE datasets combined with TREC QA dataset. Codes and datasets can be downloaded at https://github.com/ellenmellon/ReQuest.
AB - Automatic relation extraction (RE) for types of interest is of great importance for interpreting massive text corpora in an efficient manner. For example,we want to identify the relationship "president-of" between entities "Donald Trump" and "United States" in a sentence expressing such a relation. Traditional RE models have heavily relied on human-annotated corpus for training, which can be costly in generating labeled data and become obstacles when dealing with more relation types. Thus, more RE extraction systems have shifted to be built upon training data automatically acquired by linking to knowledge bases (distant supervision). However, due to the incompleteness of knowledge bases and the context-agnostic labeling, the training data collected via distant supervision (DS) can be very noisy. In recent years, as increasing attention has been brought to tackling question-answering (QA) tasks, user feedback or datasets of such tasks become more accessible. In this paper, we propose a novel framework, ReQuest, to leverage question-answer pairs as an indirect source of supervision for relation extraction, and study how to use such supervision to reduce noise induced from DS. Our model jointly embeds relation mentions, types, QA entity mention pairs and text features in two low-dimensional spaces (RE and QA), where objects with same relation types or semantically similar question-answer pairs have similar representations. Shared features connect these two spaces, carrying clearer semantic knowledge from both sources. ReQuest, then use these learned embeddings to estimate the types of test relation mentions. We formulate a global objective function and adopt a novel margin-based QA loss to reduce noise in DS by exploiting semantic evidence from the QA dataset. Our experimental results achieve an average of 11% improvement in F1 score on two public RE datasets combined with TREC QA dataset. Codes and datasets can be downloaded at https://github.com/ellenmellon/ReQuest.
UR - http://www.scopus.com/inward/record.url?scp=85040591295&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85040591295&partnerID=8YFLogxK
U2 - 10.1145/3159652.3159709
DO - 10.1145/3159652.3159709
M3 - Conference contribution
AN - SCOPUS:85040591295
T3 - WSDM 2018 - Proceedings of the 11th ACM International Conference on Web Search and Data Mining
SP - 646
EP - 654
BT - WSDM 2018 - Proceedings of the 11th ACM International Conference on Web Search and Data Mining
PB - Association for Computing Machinery
T2 - 11th ACM International Conference on Web Search and Data Mining, WSDM 2018
Y2 - 5 February 2018 through 9 February 2018
ER -