Comparing breast cancer treatments using automatically detected surrogate and clinically relevant outcomes entities from text

Catherine Blake, Rebecca Kehm

Research output: Contribution to journalArticlepeer-review


Population, intervention, comparison and outcome (PICO) facets of clinical studies are required both for physicians in a clinical setting and for reviewers as they compare the effectiveness of different treatment strategies. Automated methods developed for the first three of these facets identify entities, but outcome detection has been limited to identifying the entire sentence. We frame outcome detection as a noun phrase prediction task and use semi-supervised learning to detect new outcomes (aka endpoints) from the method section of 88 K MEDLINE abstracts. A manual analysis showed that 96.7% of all outcomes can be captured using a noun phrase representation. With respect to the machine learning classifiers, the Support Vector Machine produced higher precision, F1-score, and accuracy than the General Linear Model when evaluated with respect to the initial gold standard of survivorship seed terms and a manual gold standard that considered all outcomes. However, the best model does not employ machine learning, but rather leverages list structure and resulted in 90.14 precision, 60.69 recall, 75.41 F1-score, and 92.60 accuracy with respect to the manual gold standard of all outcomes. Finally we developed a silver standard with a precision of 89.28 and recall of 86.77 compared to the manual gold standard and used the silver standard to identify all outcomes reported for five breast cancer treatments. The increased precision afforded by this approach reveals that in contrast to chemotherapy and targeted therapy, the surrogate outcome disease free survival (DFS) is reported more frequently than the clinically relevant outcome overall survival (OS) for hormone therapies, which is consistent with findings that DFS translates into firm OS improvements in a hormone therapy setting.

Original languageEnglish (US)
Article number100005
JournalJournal of Biomedical Informatics: X
StatePublished - Mar 2019


  • Breast cancer outcomes
  • Evidence-based medicine
  • Machine learning
  • Outcome extraction
  • Systematic reviews
  • Text mining

ASJC Scopus subject areas

  • Computer Science Applications
  • Health Informatics


Dive into the research topics of 'Comparing breast cancer treatments using automatically detected surrogate and clinically relevant outcomes entities from text'. Together they form a unique fingerprint.

Cite this