TY - GEN
T1 - Corpus-Based Relation Extraction by Identifying and Refining Relation Patterns
AU - Zhou, Sizhe
AU - Ge, Suyu
AU - Shen, Jiaming
AU - Han, Jiawei
N1 - Publisher Copyright:
© 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2023
Y1 - 2023
N2 - Automated relation extraction without extensive human-annotated data is a crucial yet challenging task in text mining. Existing studies typically use lexical patterns to label a small set of high-precision relation triples and then employ distributional methods to enhance detection recall. This precision-first approach works well for common relation types but struggles with unconventional and infrequent ones. In this work, we propose a recall-first approach that first leverages high-recall patterns (e.g., a per:siblings relation normally requires both the head and tail entities in the person type) to provide initial candidate relation triples with weak labels and then clusters these candidate relation triples in a latent spherical space to extract high-quality weak supervisions. Specifically, we present a novel framework, RClus, where each relation triple is represented by its head/tail entity type and the shortest dependency path between the entity mentions. RClus first applies high-recall patterns to narrow down each relation type’s candidate space. Then, it embeds candidate relation triples in a latent space and conducts spherical clustering to further filter out noisy candidates and identify high-quality weakly-labeled triples. Finally, RClus leverages the above-obtained triples to prompt-tune a pre-trained language model and utilizes it for improved extraction coverage. We conduct extensive experiments on three public datasets and demonstrate that RClus outperforms the weakly-supervised baselines by a large margin and achieves generally better performance than fully-supervised methods in low-resource settings.
AB - Automated relation extraction without extensive human-annotated data is a crucial yet challenging task in text mining. Existing studies typically use lexical patterns to label a small set of high-precision relation triples and then employ distributional methods to enhance detection recall. This precision-first approach works well for common relation types but struggles with unconventional and infrequent ones. In this work, we propose a recall-first approach that first leverages high-recall patterns (e.g., a per:siblings relation normally requires both the head and tail entities in the person type) to provide initial candidate relation triples with weak labels and then clusters these candidate relation triples in a latent spherical space to extract high-quality weak supervisions. Specifically, we present a novel framework, RClus, where each relation triple is represented by its head/tail entity type and the shortest dependency path between the entity mentions. RClus first applies high-recall patterns to narrow down each relation type’s candidate space. Then, it embeds candidate relation triples in a latent space and conducts spherical clustering to further filter out noisy candidates and identify high-quality weakly-labeled triples. Finally, RClus leverages the above-obtained triples to prompt-tune a pre-trained language model and utilizes it for improved extraction coverage. We conduct extensive experiments on three public datasets and demonstrate that RClus outperforms the weakly-supervised baselines by a large margin and achieves generally better performance than fully-supervised methods in low-resource settings.
KW - Latent Space Clustering
KW - Relation Extraction
KW - Weak Supervision
UR - http://www.scopus.com/inward/record.url?scp=85174440242&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85174440242&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-43421-1_2
DO - 10.1007/978-3-031-43421-1_2
M3 - Conference contribution
AN - SCOPUS:85174440242
SN - 9783031434204
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 20
EP - 38
BT - Machine Learning and Knowledge Discovery in Databases
A2 - Koutra, Danai
A2 - Plant, Claudia
A2 - Gomez Rodriguez, Manuel
A2 - Baralis, Elena
A2 - Bonchi, Francesco
PB - Springer
T2 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2023
Y2 - 18 September 2023 through 22 September 2023
ER -