Corpus-Based Relation Extraction by Identifying and Refining Relation Patterns

Sizhe Zhou, Suyu Ge, Jiaming Shen, Jiawei Han

Research output: Chapter in Book/Report/Conference proceedingConference contribution


Automated relation extraction without extensive human-annotated data is a crucial yet challenging task in text mining. Existing studies typically use lexical patterns to label a small set of high-precision relation triples and then employ distributional methods to enhance detection recall. This precision-first approach works well for common relation types but struggles with unconventional and infrequent ones. In this work, we propose a recall-first approach that first leverages high-recall patterns (e.g., a per:siblings relation normally requires both the head and tail entities in the person type) to provide initial candidate relation triples with weak labels and then clusters these candidate relation triples in a latent spherical space to extract high-quality weak supervisions. Specifically, we present a novel framework, RClus, where each relation triple is represented by its head/tail entity type and the shortest dependency path between the entity mentions. RClus first applies high-recall patterns to narrow down each relation type’s candidate space. Then, it embeds candidate relation triples in a latent space and conducts spherical clustering to further filter out noisy candidates and identify high-quality weakly-labeled triples. Finally, RClus leverages the above-obtained triples to prompt-tune a pre-trained language model and utilizes it for improved extraction coverage. We conduct extensive experiments on three public datasets and demonstrate that RClus outperforms the weakly-supervised baselines by a large margin and achieves generally better performance than fully-supervised methods in low-resource settings.

Original languageEnglish (US)
Title of host publicationMachine Learning and Knowledge Discovery in Databases
Subtitle of host publicationResearch Track - European Conference, ECML PKDD 2023, Proceedings
EditorsDanai Koutra, Claudia Plant, Manuel Gomez Rodriguez, Elena Baralis, Francesco Bonchi
Number of pages19
ISBN (Print)9783031434204
StatePublished - 2023
EventEuropean Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2023 - Turin, Italy
Duration: Sep 18 2023Sep 22 2023

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume14172 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


ConferenceEuropean Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2023


  • Latent Space Clustering
  • Relation Extraction
  • Weak Supervision

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science


Dive into the research topics of 'Corpus-Based Relation Extraction by Identifying and Refining Relation Patterns'. Together they form a unique fingerprint.

Cite this