Evaluating automated entity extraction with respect to drug and non-drug treatment strategies

Jinlong Guo, Catherine Lesley Blake, Yingjun Guan

Research output: Contribution to journalArticle

Abstract

Objectives: Treatment used in a randomized clinical trial is a critical data element both for physicians at the point of care and reviewers who are evaluating different interventions. Much of existing work on treatment extraction from the biomedical literature has focused on the extraction of pharmacological interventions. However, non-pharmacological interventions (e.g., exercise, diet, etc.) that are frequently used to address chronic conditions are less well studied. The goal of this study is to compare knowledge-based and machine learning strategies for the extraction of both drug and non-drug treatments. Methods: We collected 800 randomized clinical trial abstracts each for breast cancer and diabetes from PubMed. The treatments in the result/conclusion sentences of the abstracts were manually annotated and marked as drug/non-drug treatments. We then designed three methods to identify the treatments and evaluated the systems with respect to drug/non-drug treatments. The first method is solely based on knowledge base (here we used MetaMap). The second method is based on a machine learning model trained mainly on contextual features (ML_only). The third method is a combination approach that integrates the previous two approaches. Results/discussion: Results show that MetaMap, when used with high precision semantic types, has better performance for drug compared to non-drug treatments (F1 = 0.77 vs. 0.64). The ML_only approach has smaller performance difference between drug and non-drug treatments compared with the KB-based approach (F1 = 0.02 vs. 0.05, 0.07, and 0.13). The combination approach achieves significantly better performance than all MetaMap approaches alone for total treatments (F1 = 0.76 vs. 0.72, p < 0.001). The performance gain mainly comes from the non-drug treatments (0.03–0.08 improvement in F1), while the drug treatments do not benefit much from the combination approach (0–0.03 improvement in F1). Conclusion: These results suggest that a knowledge-based approach should be employed for medical conditions that are primarily treated with drugs whereas conditions that are treated with either a combination of drug and non-drug interventions or primarily non-drug interventions should use automated tools that combine machine learning and a knowledge-based approach to achieve optimal performance.

Original languageEnglish (US)
Article number103177
JournalJournal of Biomedical Informatics
Volume94
DOIs
StatePublished - Jun 1 2019

Fingerprint

Learning systems
Pharmaceutical Preparations
Drug therapy
Nutrition
Medical problems
Therapeutics
Semantics
Randomized Controlled Trials
Point-of-Care Systems
Knowledge Bases
Drug Combinations
PubMed
Pharmacology
Exercise
Breast Neoplasms
Diet
Physicians
Machine Learning

Keywords

  • Entity recognition
  • Machine learning
  • MetaMap
  • Treatment extraction

ASJC Scopus subject areas

  • Computer Science Applications
  • Health Informatics

Cite this

Evaluating automated entity extraction with respect to drug and non-drug treatment strategies. / Guo, Jinlong; Blake, Catherine Lesley; Guan, Yingjun.

In: Journal of Biomedical Informatics, Vol. 94, 103177, 01.06.2019.

Research output: Contribution to journalArticle

@article{55d2c9d8e6634d16bcc43ef313f8396c,
title = "Evaluating automated entity extraction with respect to drug and non-drug treatment strategies",
abstract = "Objectives: Treatment used in a randomized clinical trial is a critical data element both for physicians at the point of care and reviewers who are evaluating different interventions. Much of existing work on treatment extraction from the biomedical literature has focused on the extraction of pharmacological interventions. However, non-pharmacological interventions (e.g., exercise, diet, etc.) that are frequently used to address chronic conditions are less well studied. The goal of this study is to compare knowledge-based and machine learning strategies for the extraction of both drug and non-drug treatments. Methods: We collected 800 randomized clinical trial abstracts each for breast cancer and diabetes from PubMed. The treatments in the result/conclusion sentences of the abstracts were manually annotated and marked as drug/non-drug treatments. We then designed three methods to identify the treatments and evaluated the systems with respect to drug/non-drug treatments. The first method is solely based on knowledge base (here we used MetaMap). The second method is based on a machine learning model trained mainly on contextual features (ML_only). The third method is a combination approach that integrates the previous two approaches. Results/discussion: Results show that MetaMap, when used with high precision semantic types, has better performance for drug compared to non-drug treatments (F1 = 0.77 vs. 0.64). The ML_only approach has smaller performance difference between drug and non-drug treatments compared with the KB-based approach (F1 = 0.02 vs. 0.05, 0.07, and 0.13). The combination approach achieves significantly better performance than all MetaMap approaches alone for total treatments (F1 = 0.76 vs. 0.72, p < 0.001). The performance gain mainly comes from the non-drug treatments (0.03–0.08 improvement in F1), while the drug treatments do not benefit much from the combination approach (0–0.03 improvement in F1). Conclusion: These results suggest that a knowledge-based approach should be employed for medical conditions that are primarily treated with drugs whereas conditions that are treated with either a combination of drug and non-drug interventions or primarily non-drug interventions should use automated tools that combine machine learning and a knowledge-based approach to achieve optimal performance.",
keywords = "Entity recognition, Machine learning, MetaMap, Treatment extraction",
author = "Jinlong Guo and Blake, {Catherine Lesley} and Yingjun Guan",
year = "2019",
month = "6",
day = "1",
doi = "10.1016/j.jbi.2019.103177",
language = "English (US)",
volume = "94",
journal = "Journal of Biomedical Informatics",
issn = "1532-0464",
publisher = "Academic Press Inc.",

}

TY - JOUR

T1 - Evaluating automated entity extraction with respect to drug and non-drug treatment strategies

AU - Guo, Jinlong

AU - Blake, Catherine Lesley

AU - Guan, Yingjun

PY - 2019/6/1

Y1 - 2019/6/1

N2 - Objectives: Treatment used in a randomized clinical trial is a critical data element both for physicians at the point of care and reviewers who are evaluating different interventions. Much of existing work on treatment extraction from the biomedical literature has focused on the extraction of pharmacological interventions. However, non-pharmacological interventions (e.g., exercise, diet, etc.) that are frequently used to address chronic conditions are less well studied. The goal of this study is to compare knowledge-based and machine learning strategies for the extraction of both drug and non-drug treatments. Methods: We collected 800 randomized clinical trial abstracts each for breast cancer and diabetes from PubMed. The treatments in the result/conclusion sentences of the abstracts were manually annotated and marked as drug/non-drug treatments. We then designed three methods to identify the treatments and evaluated the systems with respect to drug/non-drug treatments. The first method is solely based on knowledge base (here we used MetaMap). The second method is based on a machine learning model trained mainly on contextual features (ML_only). The third method is a combination approach that integrates the previous two approaches. Results/discussion: Results show that MetaMap, when used with high precision semantic types, has better performance for drug compared to non-drug treatments (F1 = 0.77 vs. 0.64). The ML_only approach has smaller performance difference between drug and non-drug treatments compared with the KB-based approach (F1 = 0.02 vs. 0.05, 0.07, and 0.13). The combination approach achieves significantly better performance than all MetaMap approaches alone for total treatments (F1 = 0.76 vs. 0.72, p < 0.001). The performance gain mainly comes from the non-drug treatments (0.03–0.08 improvement in F1), while the drug treatments do not benefit much from the combination approach (0–0.03 improvement in F1). Conclusion: These results suggest that a knowledge-based approach should be employed for medical conditions that are primarily treated with drugs whereas conditions that are treated with either a combination of drug and non-drug interventions or primarily non-drug interventions should use automated tools that combine machine learning and a knowledge-based approach to achieve optimal performance.

AB - Objectives: Treatment used in a randomized clinical trial is a critical data element both for physicians at the point of care and reviewers who are evaluating different interventions. Much of existing work on treatment extraction from the biomedical literature has focused on the extraction of pharmacological interventions. However, non-pharmacological interventions (e.g., exercise, diet, etc.) that are frequently used to address chronic conditions are less well studied. The goal of this study is to compare knowledge-based and machine learning strategies for the extraction of both drug and non-drug treatments. Methods: We collected 800 randomized clinical trial abstracts each for breast cancer and diabetes from PubMed. The treatments in the result/conclusion sentences of the abstracts were manually annotated and marked as drug/non-drug treatments. We then designed three methods to identify the treatments and evaluated the systems with respect to drug/non-drug treatments. The first method is solely based on knowledge base (here we used MetaMap). The second method is based on a machine learning model trained mainly on contextual features (ML_only). The third method is a combination approach that integrates the previous two approaches. Results/discussion: Results show that MetaMap, when used with high precision semantic types, has better performance for drug compared to non-drug treatments (F1 = 0.77 vs. 0.64). The ML_only approach has smaller performance difference between drug and non-drug treatments compared with the KB-based approach (F1 = 0.02 vs. 0.05, 0.07, and 0.13). The combination approach achieves significantly better performance than all MetaMap approaches alone for total treatments (F1 = 0.76 vs. 0.72, p < 0.001). The performance gain mainly comes from the non-drug treatments (0.03–0.08 improvement in F1), while the drug treatments do not benefit much from the combination approach (0–0.03 improvement in F1). Conclusion: These results suggest that a knowledge-based approach should be employed for medical conditions that are primarily treated with drugs whereas conditions that are treated with either a combination of drug and non-drug interventions or primarily non-drug interventions should use automated tools that combine machine learning and a knowledge-based approach to achieve optimal performance.

KW - Entity recognition

KW - Machine learning

KW - MetaMap

KW - Treatment extraction

UR - http://www.scopus.com/inward/record.url?scp=85064566344&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85064566344&partnerID=8YFLogxK

U2 - 10.1016/j.jbi.2019.103177

DO - 10.1016/j.jbi.2019.103177

M3 - Article

VL - 94

JO - Journal of Biomedical Informatics

JF - Journal of Biomedical Informatics

SN - 1532-0464

M1 - 103177

ER -