Combining Open-domain and Biomedical Knowledge for Topic Recognition in Consumer Health Questions

Yassine Mrabet, Halil Kilicoglu, Kirk Roberts, Dina Demner-Fushman

Research output: Contribution to journalArticlepeer-review


Determining the main topics in consumer health questions is a crucial step in their processing as it allows narrowing the search space to a specific semantic context. In this paper we propose a topic recognition approach based on biomedical and open-domain knowledge bases. In the first step of our method, we recognize named entities in consumer health questions using an unsupervised method that relies on a biomedical knowledge base, UMLS, and an open-domain knowledge base, DBpedia. In the next step, we cast topic recognition as a binary classification problem of deciding whether a named entity is the question topic or not. We evaluated our approach on a dataset from the National Library of Medicine (NLM), introduced in this paper, and another from the Genetic and Rare Disease Information Center (GARD). The combination of knowledge bases outperformed the results obtained by individual knowledge bases by up to 16.5% F1 and achieved state-of-the-art performance. Our results demonstrate that combining open-domain knowledge bases with biomedical knowledge bases can lead to a substantial improvement in understanding user-generated health content.

Original languageEnglish (US)
Pages (from-to)914-923
Number of pages10
JournalAMIA Annual Symposium Proceedings
StatePublished - 2016
Externally publishedYes

ASJC Scopus subject areas

  • General Medicine


Dive into the research topics of 'Combining Open-domain and Biomedical Knowledge for Topic Recognition in Consumer Health Questions'. Together they form a unique fingerprint.

Cite this