TY - JOUR
T1 - Machine learning helps identify new drug mechanisms in triple-negative breast cancer
AU - Athreya, Arjun P.
AU - Gaglio, Alan J.
AU - Cairns, Junmei
AU - Kalari, Krishna R.
AU - Weinshilboum, Richard M.
AU - Wang, Liewei
AU - Kalbarczyk, Zbigniew T.
AU - Iyer, Ravishankar K.
N1 - Funding Information:
Manuscript received May 14, 2017; accepted May 14, 2018. Date of publication July 2, 2018; date of current version July 31, 2018. This work was supported in part by the Mayo Clinic and Illinois Alliance Fellowship for Technology-Based Healthcare Research, in part by the CompGen Fellowship, in part by the IBM Faculty Award, in part by the National Science Foundation under Grants CNS 13-37732, CNS 16-24790, and CNS 16-4615, in part by the National Institutes of Health under Grants R01 GM28157, R01 CA196648, U19 GM61388 (The Pharmacogenomics Research Network), Breast SPORE P50CA116201, and U19 GM61388, and in part by the Mayo Clinic Center for Individualized Medicine. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the NSF and NIH. This paper was presented at the 2016 IEEE International Conference on Bioinformatics and Biomedicine [1]. (Corresponding author: Ravishankar K. Iyer.) A. P. Athreya, Z. T. Kalbarczyk, and R. K. Iyer are with the Department of Electrical and Computer Engineering, University of Illinois at Urbana–Champaign, Urbana, IL 61801 USA (e-mail: rkiyer@illinois.edu).
Publisher Copyright:
© 2018 IEEE.
PY - 2018/7
Y1 - 2018/7
N2 - This paper demonstrates the ability of mach- ine learning approaches to identify a few genes among the 23,398 genes of the human genome to experiment on in the laboratory to establish new drug mechanisms. As a case study, this paper uses MDA-MB-231 breast cancer single-cells treated with the antidiabetic drug metformin. We show that mixture-model-based unsupervised methods with validation from hierarchical clustering can identify single-cell subpopulations (clusters). These clusters are characterized by a small set of genes (1% of the genome) that have significant differential expression across the clusters and are also highly correlated with pathways with anticancer effects driven by metformin. Among the identified small set of genes associated with reduced breast cancer incidence, laboratory experiments on one of the genes, CDC42, showed that its downregulation by metformin inhibited cancer cell migration and proliferation, thus validating the ability of machine learning approaches to identify biologically relevant candidates for laboratory experiments. Given the large size of the human genome and limitations in cost and skilled resources, the broader impact of this work in identifying a small set of differentially expressed genes after drug treatment lies in augmenting the drug-disease knowledge of pharmacogenomics experts in laboratory investigations, which could help establish novel biological mechanisms associated with drug response in diseases beyond breast cancer.
AB - This paper demonstrates the ability of mach- ine learning approaches to identify a few genes among the 23,398 genes of the human genome to experiment on in the laboratory to establish new drug mechanisms. As a case study, this paper uses MDA-MB-231 breast cancer single-cells treated with the antidiabetic drug metformin. We show that mixture-model-based unsupervised methods with validation from hierarchical clustering can identify single-cell subpopulations (clusters). These clusters are characterized by a small set of genes (1% of the genome) that have significant differential expression across the clusters and are also highly correlated with pathways with anticancer effects driven by metformin. Among the identified small set of genes associated with reduced breast cancer incidence, laboratory experiments on one of the genes, CDC42, showed that its downregulation by metformin inhibited cancer cell migration and proliferation, thus validating the ability of machine learning approaches to identify biologically relevant candidates for laboratory experiments. Given the large size of the human genome and limitations in cost and skilled resources, the broader impact of this work in identifying a small set of differentially expressed genes after drug treatment lies in augmenting the drug-disease knowledge of pharmacogenomics experts in laboratory investigations, which could help establish novel biological mechanisms associated with drug response in diseases beyond breast cancer.
KW - Breast Cancer
KW - Metformin
KW - Mixture-Models
KW - Model-Based Learning
KW - Single-Cell RNASeq
KW - Unsupervised Learning
UR - http://www.scopus.com/inward/record.url?scp=85049338133&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85049338133&partnerID=8YFLogxK
U2 - 10.1109/TNB.2018.2851997
DO - 10.1109/TNB.2018.2851997
M3 - Article
C2 - 29994716
AN - SCOPUS:85049338133
SN - 1536-1241
VL - 17
SP - 251
EP - 259
JO - IEEE Transactions on Nanobioscience
JF - IEEE Transactions on Nanobioscience
IS - 3
M1 - 8401331
ER -