TY - GEN
T1 - Textual Evidence Mining via Spherical Heterogeneous Information Network Embedding
AU - Wang, Xuan
AU - Zhang, Yu
AU - Chauhan, Aabhas
AU - Li, Qi
AU - Han, Jiawei
N1 - Funding Information:
Research was sponsored in part by US DARPA KAIROS Program No. FA8750-19-2-1004 and SocialSim Program No. W911NF-17-C-0099, National Science Foundation IIS-19-56151, IIS-17-41317, IIS 17-04532, and IIS 16-18481, and DTRA HDTRA11810026. Any opinions, findings, and conclusions or recommendations expressed herein are those of the authors and should not be interpreted as necessarily representing the views, either expressed or implied, of DARPA or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for government purposes notwithstanding any copyright annotation hereon.
Publisher Copyright:
© 2020 IEEE.
PY - 2020/12/10
Y1 - 2020/12/10
N2 - Scientific literature, as one of the major knowledge resources, provides abundant textual evidence that has great potential to support high-quality scientific hypothesis validation. In this paper, we study the problem of textual evidence mining in scientific literature: given a scientific hypothesis as a query triplet, find the textual evidence sentences in scientific literature that support the input query. A critical challenge for textual evidence mining in scientific literature is to retrieve high-quality textual evidence without human supervision. Because it is non-trivial to obtain a large set of human-annotated articles containing evidence sentences in scientific literature. To tackle this challenge, we propose EvidenceMiner, a high-quality textual evidence retrieval method for scientific literature without human-annotated training examples. To achieve high-quality textual evidence retrieval, we leverage heterogeneous information from both existing knowledge bases and massive unstructured text. We propose to construct a large heterogeneous information network (HIN) to build connections between the user-input queries and the candidate evidence sentences. Based on the constructed HIN, we propose a novel HIN embedding method that directly embeds the nodes onto a spherical space to improve the retrieval performance. Quantitative experiments on a huge biomedical literature corpus (over 4 million sentences) demonstrate that EvidenceMiner significantly outperforms baseline methods for unsupervised textual evidence retrieval. Case studies also demonstrate that our HIN construction and embedding greatly benefit many downstream applications such as textual evidence interpretation and synonym meta-pattern discovery.
AB - Scientific literature, as one of the major knowledge resources, provides abundant textual evidence that has great potential to support high-quality scientific hypothesis validation. In this paper, we study the problem of textual evidence mining in scientific literature: given a scientific hypothesis as a query triplet, find the textual evidence sentences in scientific literature that support the input query. A critical challenge for textual evidence mining in scientific literature is to retrieve high-quality textual evidence without human supervision. Because it is non-trivial to obtain a large set of human-annotated articles containing evidence sentences in scientific literature. To tackle this challenge, we propose EvidenceMiner, a high-quality textual evidence retrieval method for scientific literature without human-annotated training examples. To achieve high-quality textual evidence retrieval, we leverage heterogeneous information from both existing knowledge bases and massive unstructured text. We propose to construct a large heterogeneous information network (HIN) to build connections between the user-input queries and the candidate evidence sentences. Based on the constructed HIN, we propose a novel HIN embedding method that directly embeds the nodes onto a spherical space to improve the retrieval performance. Quantitative experiments on a huge biomedical literature corpus (over 4 million sentences) demonstrate that EvidenceMiner significantly outperforms baseline methods for unsupervised textual evidence retrieval. Case studies also demonstrate that our HIN construction and embedding greatly benefit many downstream applications such as textual evidence interpretation and synonym meta-pattern discovery.
KW - heterogeneous information network
KW - spherical graph embedding
KW - textual evidence mining
UR - http://www.scopus.com/inward/record.url?scp=85103858995&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85103858995&partnerID=8YFLogxK
U2 - 10.1109/BigData50022.2020.9377958
DO - 10.1109/BigData50022.2020.9377958
M3 - Conference contribution
AN - SCOPUS:85103858995
T3 - Proceedings - 2020 IEEE International Conference on Big Data, Big Data 2020
SP - 828
EP - 837
BT - Proceedings - 2020 IEEE International Conference on Big Data, Big Data 2020
A2 - Wu, Xintao
A2 - Jermaine, Chris
A2 - Xiong, Li
A2 - Hu, Xiaohua Tony
A2 - Kotevska, Olivera
A2 - Lu, Siyuan
A2 - Xu, Weijia
A2 - Aluru, Srinivas
A2 - Zhai, Chengxiang
A2 - Al-Masri, Eyhab
A2 - Chen, Zhiyuan
A2 - Saltz, Jeff
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 8th IEEE International Conference on Big Data, Big Data 2020
Y2 - 10 December 2020 through 13 December 2020
ER -