TY - JOUR
T1 - Mining strong relevance between heterogeneous entities from unstructured biomedical data
AU - Ji, Ming
AU - He, Qi
AU - Han, Jiawei
AU - Spangler, Scott
N1 - Publisher Copyright:
© 2015, The Author(s).
Copyright:
Copyright 2015 Elsevier B.V., All rights reserved.
PY - 2015/7/8
Y1 - 2015/7/8
N2 - Huge volumes of biomedical text data discussing about different biomedical entities are being generated every day. Hidden in those unstructured data are the strong relevance relationships between those entities, which are critical for many interesting applications including building knowledge bases for the biomedical domain and semantic search among biomedical entities. In this paper, we study the problem of discovering strong relevance between heterogeneous typed biomedical entities from massive biomedical text data. We first build an entity correlation graph from data, in which the collection of paths linking two heterogeneous entities offer rich semantic contexts for their relationships, especially those paths following the patterns of top-k selected meta paths inferred from data. Guided by such meta paths, we design a novel relevance measure to compute the strong relevance between two heterogeneous entities, named EntityRel. Our intuition is, two entities of heterogeneous types are strongly relevant if they have strong direct links or they are linked closely to other strongly relevant heterogeneous entities along paths following the selected patterns. We provide experimental results on mining strong relevance between drugs and diseases. More than 20 millions of MEDLINE abstracts and 5 types of biological entities (Drug, Disease, Compound, Target, MeSH) are used to construct the entity correlation graph. A prototype of drug search engine for disease queries is implemented. Extensive comparisons are made against multiple state-of-the-arts in the examples of Drug–Disease relevance discovery.
AB - Huge volumes of biomedical text data discussing about different biomedical entities are being generated every day. Hidden in those unstructured data are the strong relevance relationships between those entities, which are critical for many interesting applications including building knowledge bases for the biomedical domain and semantic search among biomedical entities. In this paper, we study the problem of discovering strong relevance between heterogeneous typed biomedical entities from massive biomedical text data. We first build an entity correlation graph from data, in which the collection of paths linking two heterogeneous entities offer rich semantic contexts for their relationships, especially those paths following the patterns of top-k selected meta paths inferred from data. Guided by such meta paths, we design a novel relevance measure to compute the strong relevance between two heterogeneous entities, named EntityRel. Our intuition is, two entities of heterogeneous types are strongly relevant if they have strong direct links or they are linked closely to other strongly relevant heterogeneous entities along paths following the selected patterns. We provide experimental results on mining strong relevance between drugs and diseases. More than 20 millions of MEDLINE abstracts and 5 types of biological entities (Drug, Disease, Compound, Target, MeSH) are used to construct the entity correlation graph. A prototype of drug search engine for disease queries is implemented. Extensive comparisons are made against multiple state-of-the-arts in the examples of Drug–Disease relevance discovery.
KW - Biomedical text data
KW - Context-aware
KW - Heterogeneous
KW - Meta path
KW - Relevance
UR - http://www.scopus.com/inward/record.url?scp=84930482906&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84930482906&partnerID=8YFLogxK
U2 - 10.1007/s10618-014-0396-4
DO - 10.1007/s10618-014-0396-4
M3 - Article
AN - SCOPUS:84930482906
SN - 1384-5810
VL - 29
SP - 976
EP - 998
JO - Data Mining and Knowledge Discovery
JF - Data Mining and Knowledge Discovery
IS - 4
ER -