Integrating local context and global cohesiveness for open information extraction

Qi Zhu, Xiang Ren, Jingbo Shang, Yu Zhang, Ahmed El-Kishky, Jiawei Han

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Extracting entities and their relations from text is an important task for understanding massive text corpora. Open information extraction (IE) systems mine relation tuples (i.e., entity arguments and a predicate string to describe their relation) from sentences. These relation tuples are not confined to a predefined schema for the relations of interests. However, current Open IE systems focus on modeling local context information in a sentence to extract relation tuples, while ignoring the fact that global statistics in a large corpus can be collectively leveraged to identify high-quality sentence-level extractions. In this paper, we propose a novel Open IE system, called ReMine, which integrates local context signals and global structural signals in a unified, distant-supervision framework. Leveraging facts from external knowledge bases as supervision, the new system can be applied to many different domains to facilitate sentence-level tuple extractions using corpus-level statistics. Our system operates by solving a joint optimization problem to unify (1) segmenting entity/relation phrases in individual sentences based on local context; and (2) measuring the quality of tuples extracted from individual sentences with a translating-based objective. Learning the two subtasks jointly helps correct errors produced in each subtask so that they can mutually enhance each other. Experiments on two real-world corpora from different domains demonstrate the effectiveness, generality, and robustness of ReMine when compared to state-of-the-art open IE systems.

Original languageEnglish (US)
Title of host publicationWSDM 2019 - Proceedings of the 12th ACM International Conference on Web Search and Data Mining
PublisherAssociation for Computing Machinery, Inc
Pages42-50
Number of pages9
ISBN (Electronic)9781450359405
DOIs
StatePublished - Jan 30 2019
Event12th ACM International Conference on Web Search and Data Mining, WSDM 2019 - Melbourne, Australia
Duration: Feb 11 2019Feb 15 2019

Publication series

NameWSDM 2019 - Proceedings of the 12th ACM International Conference on Web Search and Data Mining

Conference

Conference12th ACM International Conference on Web Search and Data Mining, WSDM 2019
CountryAustralia
CityMelbourne
Period2/11/192/15/19

Fingerprint

Statistics
Experiments

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Software
  • Computer Science Applications

Cite this

Zhu, Q., Ren, X., Shang, J., Zhang, Y., El-Kishky, A., & Han, J. (2019). Integrating local context and global cohesiveness for open information extraction. In WSDM 2019 - Proceedings of the 12th ACM International Conference on Web Search and Data Mining (pp. 42-50). (WSDM 2019 - Proceedings of the 12th ACM International Conference on Web Search and Data Mining). Association for Computing Machinery, Inc. https://doi.org/10.1145/3289600.3291030

Integrating local context and global cohesiveness for open information extraction. / Zhu, Qi; Ren, Xiang; Shang, Jingbo; Zhang, Yu; El-Kishky, Ahmed; Han, Jiawei.

WSDM 2019 - Proceedings of the 12th ACM International Conference on Web Search and Data Mining. Association for Computing Machinery, Inc, 2019. p. 42-50 (WSDM 2019 - Proceedings of the 12th ACM International Conference on Web Search and Data Mining).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Zhu, Q, Ren, X, Shang, J, Zhang, Y, El-Kishky, A & Han, J 2019, Integrating local context and global cohesiveness for open information extraction. in WSDM 2019 - Proceedings of the 12th ACM International Conference on Web Search and Data Mining. WSDM 2019 - Proceedings of the 12th ACM International Conference on Web Search and Data Mining, Association for Computing Machinery, Inc, pp. 42-50, 12th ACM International Conference on Web Search and Data Mining, WSDM 2019, Melbourne, Australia, 2/11/19. https://doi.org/10.1145/3289600.3291030
Zhu Q, Ren X, Shang J, Zhang Y, El-Kishky A, Han J. Integrating local context and global cohesiveness for open information extraction. In WSDM 2019 - Proceedings of the 12th ACM International Conference on Web Search and Data Mining. Association for Computing Machinery, Inc. 2019. p. 42-50. (WSDM 2019 - Proceedings of the 12th ACM International Conference on Web Search and Data Mining). https://doi.org/10.1145/3289600.3291030
Zhu, Qi ; Ren, Xiang ; Shang, Jingbo ; Zhang, Yu ; El-Kishky, Ahmed ; Han, Jiawei. / Integrating local context and global cohesiveness for open information extraction. WSDM 2019 - Proceedings of the 12th ACM International Conference on Web Search and Data Mining. Association for Computing Machinery, Inc, 2019. pp. 42-50 (WSDM 2019 - Proceedings of the 12th ACM International Conference on Web Search and Data Mining).
@inproceedings{8416c3ae78ae46fda8ea98a327398d81,
title = "Integrating local context and global cohesiveness for open information extraction",
abstract = "Extracting entities and their relations from text is an important task for understanding massive text corpora. Open information extraction (IE) systems mine relation tuples (i.e., entity arguments and a predicate string to describe their relation) from sentences. These relation tuples are not confined to a predefined schema for the relations of interests. However, current Open IE systems focus on modeling local context information in a sentence to extract relation tuples, while ignoring the fact that global statistics in a large corpus can be collectively leveraged to identify high-quality sentence-level extractions. In this paper, we propose a novel Open IE system, called ReMine, which integrates local context signals and global structural signals in a unified, distant-supervision framework. Leveraging facts from external knowledge bases as supervision, the new system can be applied to many different domains to facilitate sentence-level tuple extractions using corpus-level statistics. Our system operates by solving a joint optimization problem to unify (1) segmenting entity/relation phrases in individual sentences based on local context; and (2) measuring the quality of tuples extracted from individual sentences with a translating-based objective. Learning the two subtasks jointly helps correct errors produced in each subtask so that they can mutually enhance each other. Experiments on two real-world corpora from different domains demonstrate the effectiveness, generality, and robustness of ReMine when compared to state-of-the-art open IE systems.",
author = "Qi Zhu and Xiang Ren and Jingbo Shang and Yu Zhang and Ahmed El-Kishky and Jiawei Han",
year = "2019",
month = "1",
day = "30",
doi = "10.1145/3289600.3291030",
language = "English (US)",
series = "WSDM 2019 - Proceedings of the 12th ACM International Conference on Web Search and Data Mining",
publisher = "Association for Computing Machinery, Inc",
pages = "42--50",
booktitle = "WSDM 2019 - Proceedings of the 12th ACM International Conference on Web Search and Data Mining",

}

TY - GEN

T1 - Integrating local context and global cohesiveness for open information extraction

AU - Zhu, Qi

AU - Ren, Xiang

AU - Shang, Jingbo

AU - Zhang, Yu

AU - El-Kishky, Ahmed

AU - Han, Jiawei

PY - 2019/1/30

Y1 - 2019/1/30

N2 - Extracting entities and their relations from text is an important task for understanding massive text corpora. Open information extraction (IE) systems mine relation tuples (i.e., entity arguments and a predicate string to describe their relation) from sentences. These relation tuples are not confined to a predefined schema for the relations of interests. However, current Open IE systems focus on modeling local context information in a sentence to extract relation tuples, while ignoring the fact that global statistics in a large corpus can be collectively leveraged to identify high-quality sentence-level extractions. In this paper, we propose a novel Open IE system, called ReMine, which integrates local context signals and global structural signals in a unified, distant-supervision framework. Leveraging facts from external knowledge bases as supervision, the new system can be applied to many different domains to facilitate sentence-level tuple extractions using corpus-level statistics. Our system operates by solving a joint optimization problem to unify (1) segmenting entity/relation phrases in individual sentences based on local context; and (2) measuring the quality of tuples extracted from individual sentences with a translating-based objective. Learning the two subtasks jointly helps correct errors produced in each subtask so that they can mutually enhance each other. Experiments on two real-world corpora from different domains demonstrate the effectiveness, generality, and robustness of ReMine when compared to state-of-the-art open IE systems.

AB - Extracting entities and their relations from text is an important task for understanding massive text corpora. Open information extraction (IE) systems mine relation tuples (i.e., entity arguments and a predicate string to describe their relation) from sentences. These relation tuples are not confined to a predefined schema for the relations of interests. However, current Open IE systems focus on modeling local context information in a sentence to extract relation tuples, while ignoring the fact that global statistics in a large corpus can be collectively leveraged to identify high-quality sentence-level extractions. In this paper, we propose a novel Open IE system, called ReMine, which integrates local context signals and global structural signals in a unified, distant-supervision framework. Leveraging facts from external knowledge bases as supervision, the new system can be applied to many different domains to facilitate sentence-level tuple extractions using corpus-level statistics. Our system operates by solving a joint optimization problem to unify (1) segmenting entity/relation phrases in individual sentences based on local context; and (2) measuring the quality of tuples extracted from individual sentences with a translating-based objective. Learning the two subtasks jointly helps correct errors produced in each subtask so that they can mutually enhance each other. Experiments on two real-world corpora from different domains demonstrate the effectiveness, generality, and robustness of ReMine when compared to state-of-the-art open IE systems.

UR - http://www.scopus.com/inward/record.url?scp=85061742382&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85061742382&partnerID=8YFLogxK

U2 - 10.1145/3289600.3291030

DO - 10.1145/3289600.3291030

M3 - Conference contribution

AN - SCOPUS:85061742382

T3 - WSDM 2019 - Proceedings of the 12th ACM International Conference on Web Search and Data Mining

SP - 42

EP - 50

BT - WSDM 2019 - Proceedings of the 12th ACM International Conference on Web Search and Data Mining

PB - Association for Computing Machinery, Inc

ER -