Evaluating automated entity extraction with respect to drug and non-drug treatment strategies

Jinlong Guo, Catherine Blake, Yingjun Guan

Research output: Contribution to journalArticlepeer-review


Objectives: Treatment used in a randomized clinical trial is a critical data element both for physicians at the point of care and reviewers who are evaluating different interventions. Much of existing work on treatment extraction from the biomedical literature has focused on the extraction of pharmacological interventions. However, non-pharmacological interventions (e.g., exercise, diet, etc.) that are frequently used to address chronic conditions are less well studied. The goal of this study is to compare knowledge-based and machine learning strategies for the extraction of both drug and non-drug treatments. Methods: We collected 800 randomized clinical trial abstracts each for breast cancer and diabetes from PubMed. The treatments in the result/conclusion sentences of the abstracts were manually annotated and marked as drug/non-drug treatments. We then designed three methods to identify the treatments and evaluated the systems with respect to drug/non-drug treatments. The first method is solely based on knowledge base (here we used MetaMap). The second method is based on a machine learning model trained mainly on contextual features (ML_only). The third method is a combination approach that integrates the previous two approaches. Results/discussion: Results show that MetaMap, when used with high precision semantic types, has better performance for drug compared to non-drug treatments (F1 = 0.77 vs. 0.64). The ML_only approach has smaller performance difference between drug and non-drug treatments compared with the KB-based approach (F1 = 0.02 vs. 0.05, 0.07, and 0.13). The combination approach achieves significantly better performance than all MetaMap approaches alone for total treatments (F1 = 0.76 vs. 0.72, p < 0.001). The performance gain mainly comes from the non-drug treatments (0.03–0.08 improvement in F1), while the drug treatments do not benefit much from the combination approach (0–0.03 improvement in F1). Conclusion: These results suggest that a knowledge-based approach should be employed for medical conditions that are primarily treated with drugs whereas conditions that are treated with either a combination of drug and non-drug interventions or primarily non-drug interventions should use automated tools that combine machine learning and a knowledge-based approach to achieve optimal performance.

Original languageEnglish (US)
Article number103177
JournalJournal of Biomedical Informatics
StatePublished - Jun 2019


  • Entity recognition
  • Machine learning
  • MetaMap
  • Treatment extraction

ASJC Scopus subject areas

  • Computer Science Applications
  • Health Informatics


Dive into the research topics of 'Evaluating automated entity extraction with respect to drug and non-drug treatment strategies'. Together they form a unique fingerprint.

Cite this