Pattern Discovery for Wide-Window Open Information Extraction in Biomedical Literature

Qi Li, Xuan Wang, Yu Zhang, Fei Ling, Cathy H. Wu, Jiawei Han

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Open information extraction is an important task in Biomedical domain. The goal of the OpenIE is to automatically extract structured information from unstructured text with no or little supervision. It aims to extract all the relation tuples from the corpus without requiring pre-specified relation types. The existing tools may extract ill-structured or incomplete information, or fail on the Biomedical literature due to the long and complicated sentences. In this paper, we propose a novel pattern-based information extraction method for the wide-window entities (WW-PIE). WW-PIE utilizes dependency parsing to break down the long sentences first and then utilizes frequent textual patterns to extract the high-quality information. The pattern hierarchical grouping organize and structure the extractions to be straightforward and precise. Consequently, comparing with the existing OpenIE tools, WW-PIE produces structured output that can be directly used for downstream applications. The proposed WW-PIE is also capable in extracting n-ary and nested relation structures, which is less studied in the existing methods. Extensive experiments on real-world biomedical corpus from PubMed abstracts demonstrate the power of WW-PIE at extracting precise and well-structured information.

Original languageEnglish (US)
Title of host publicationProceedings - 2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018
EditorsHarald Schmidt, David Griol, Haiying Wang, Jan Baumbach, Huiru Zheng, Zoraida Callejas, Xiaohua Hu, Julie Dickerson, Le Zhang
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages420-427
Number of pages8
ISBN (Electronic)9781538654880
DOIs
StatePublished - Jan 21 2019
Event2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018 - Madrid, Spain
Duration: Dec 3 2018Dec 6 2018

Publication series

NameProceedings - 2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018

Conference

Conference2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018
CountrySpain
CityMadrid
Period12/3/1812/6/18

ASJC Scopus subject areas

  • Biomedical Engineering
  • Health Informatics

Fingerprint Dive into the research topics of 'Pattern Discovery for Wide-Window Open Information Extraction in Biomedical Literature'. Together they form a unique fingerprint.

  • Cite this

    Li, Q., Wang, X., Zhang, Y., Ling, F., Wu, C. H., & Han, J. (2019). Pattern Discovery for Wide-Window Open Information Extraction in Biomedical Literature. In H. Schmidt, D. Griol, H. Wang, J. Baumbach, H. Zheng, Z. Callejas, X. Hu, J. Dickerson, & L. Zhang (Eds.), Proceedings - 2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018 (pp. 420-427). [8621375] (Proceedings - 2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/BIBM.2018.8621375