TY - JOUR
T1 - Comparing breast cancer treatments using automatically detected surrogate and clinically relevant outcomes entities from text
AU - Blake, Catherine
AU - Kehm, Rebecca
N1 - Publisher Copyright:
© 2019
PY - 2019/3
Y1 - 2019/3
N2 - Population, intervention, comparison and outcome (PICO) facets of clinical studies are required both for physicians in a clinical setting and for reviewers as they compare the effectiveness of different treatment strategies. Automated methods developed for the first three of these facets identify entities, but outcome detection has been limited to identifying the entire sentence. We frame outcome detection as a noun phrase prediction task and use semi-supervised learning to detect new outcomes (aka endpoints) from the method section of 88 K MEDLINE abstracts. A manual analysis showed that 96.7% of all outcomes can be captured using a noun phrase representation. With respect to the machine learning classifiers, the Support Vector Machine produced higher precision, F1-score, and accuracy than the General Linear Model when evaluated with respect to the initial gold standard of survivorship seed terms and a manual gold standard that considered all outcomes. However, the best model does not employ machine learning, but rather leverages list structure and resulted in 90.14 precision, 60.69 recall, 75.41 F1-score, and 92.60 accuracy with respect to the manual gold standard of all outcomes. Finally we developed a silver standard with a precision of 89.28 and recall of 86.77 compared to the manual gold standard and used the silver standard to identify all outcomes reported for five breast cancer treatments. The increased precision afforded by this approach reveals that in contrast to chemotherapy and targeted therapy, the surrogate outcome disease free survival (DFS) is reported more frequently than the clinically relevant outcome overall survival (OS) for hormone therapies, which is consistent with findings that DFS translates into firm OS improvements in a hormone therapy setting.
AB - Population, intervention, comparison and outcome (PICO) facets of clinical studies are required both for physicians in a clinical setting and for reviewers as they compare the effectiveness of different treatment strategies. Automated methods developed for the first three of these facets identify entities, but outcome detection has been limited to identifying the entire sentence. We frame outcome detection as a noun phrase prediction task and use semi-supervised learning to detect new outcomes (aka endpoints) from the method section of 88 K MEDLINE abstracts. A manual analysis showed that 96.7% of all outcomes can be captured using a noun phrase representation. With respect to the machine learning classifiers, the Support Vector Machine produced higher precision, F1-score, and accuracy than the General Linear Model when evaluated with respect to the initial gold standard of survivorship seed terms and a manual gold standard that considered all outcomes. However, the best model does not employ machine learning, but rather leverages list structure and resulted in 90.14 precision, 60.69 recall, 75.41 F1-score, and 92.60 accuracy with respect to the manual gold standard of all outcomes. Finally we developed a silver standard with a precision of 89.28 and recall of 86.77 compared to the manual gold standard and used the silver standard to identify all outcomes reported for five breast cancer treatments. The increased precision afforded by this approach reveals that in contrast to chemotherapy and targeted therapy, the surrogate outcome disease free survival (DFS) is reported more frequently than the clinically relevant outcome overall survival (OS) for hormone therapies, which is consistent with findings that DFS translates into firm OS improvements in a hormone therapy setting.
KW - Breast cancer outcomes
KW - Evidence-based medicine
KW - Machine learning
KW - Outcome extraction
KW - Systematic reviews
KW - Text mining
UR - http://www.scopus.com/inward/record.url?scp=85062478674&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85062478674&partnerID=8YFLogxK
U2 - 10.1016/j.yjbinx.2019.100005
DO - 10.1016/j.yjbinx.2019.100005
M3 - Article
C2 - 34384581
AN - SCOPUS:85062478674
SN - 2590-177X
VL - 1
JO - Journal of Biomedical Informatics: X
JF - Journal of Biomedical Informatics: X
M1 - 100005
ER -