TY - JOUR
T1 - Combining Open-domain and Biomedical Knowledge for Topic Recognition in Consumer Health Questions
AU - Mrabet, Yassine
AU - Kilicoglu, Halil
AU - Roberts, Kirk
AU - Demner-Fushman, Dina
N1 - Copyright:
This record is sourced from MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine
PY - 2016
Y1 - 2016
N2 - Determining the main topics in consumer health questions is a crucial step in their processing as it allows narrowing the search space to a specific semantic context. In this paper we propose a topic recognition approach based on biomedical and open-domain knowledge bases. In the first step of our method, we recognize named entities in consumer health questions using an unsupervised method that relies on a biomedical knowledge base, UMLS, and an open-domain knowledge base, DBpedia. In the next step, we cast topic recognition as a binary classification problem of deciding whether a named entity is the question topic or not. We evaluated our approach on a dataset from the National Library of Medicine (NLM), introduced in this paper, and another from the Genetic and Rare Disease Information Center (GARD). The combination of knowledge bases outperformed the results obtained by individual knowledge bases by up to 16.5% F1 and achieved state-of-the-art performance. Our results demonstrate that combining open-domain knowledge bases with biomedical knowledge bases can lead to a substantial improvement in understanding user-generated health content.
AB - Determining the main topics in consumer health questions is a crucial step in their processing as it allows narrowing the search space to a specific semantic context. In this paper we propose a topic recognition approach based on biomedical and open-domain knowledge bases. In the first step of our method, we recognize named entities in consumer health questions using an unsupervised method that relies on a biomedical knowledge base, UMLS, and an open-domain knowledge base, DBpedia. In the next step, we cast topic recognition as a binary classification problem of deciding whether a named entity is the question topic or not. We evaluated our approach on a dataset from the National Library of Medicine (NLM), introduced in this paper, and another from the Genetic and Rare Disease Information Center (GARD). The combination of knowledge bases outperformed the results obtained by individual knowledge bases by up to 16.5% F1 and achieved state-of-the-art performance. Our results demonstrate that combining open-domain knowledge bases with biomedical knowledge bases can lead to a substantial improvement in understanding user-generated health content.
UR - http://www.scopus.com/inward/record.url?scp=85027496196&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85027496196&partnerID=8YFLogxK
M3 - Article
C2 - 28269888
AN - SCOPUS:85027496196
SN - 1559-4076
VL - 2016
SP - 914
EP - 923
JO - AMIA Annual Symposium Proceedings
JF - AMIA Annual Symposium Proceedings
ER -