TY - JOUR
T1 - Multi-modality machine learning predicting Parkinson’s disease
AU - Makarious, Mary B.
AU - Leonard, Hampton L.
AU - Vitale, Dan
AU - Iwaki, Hirotaka
AU - Sargent, Lana
AU - Dadu, Anant
AU - Violich, Ivo
AU - Hutchins, Elizabeth
AU - Saffo, David
AU - Bandres-Ciga, Sara
AU - Kim, Jonggeol Jeff
AU - Song, Yeajin
AU - Maleknia, Melina
AU - Bookman, Matt
AU - Nojopranoto, Willy
AU - Campbell, Roy H.
AU - Hashemi, Sayed Hadi
AU - Botia, Juan A.
AU - Carter, John F.
AU - Craig, David W.
AU - Van Keuren-Jensen, Kendall
AU - Morris, Huw R.
AU - Hardy, John A.
AU - Blauwendraat, Cornelis
AU - Singleton, Andrew B.
AU - Faghri, Faraz
AU - Nalls, Mike A.
N1 - H.L.L., H.I., F.F., D.V., Y.S., and M.A.N. declare that they are consultants employed by Data Tecnica International, whose participation in this is part of a consulting agreement between the US National Institutes of Health and said company. H.R.M. is employed by UCL. In the last 24 months, he reports paid consultancy from Biogen, Biohaven, Lundbeck; lecture fees/honoraria from Wellcome Trust, Movement Disorders Society. Research Grants from Parkinson\u2019s UK, Cure Parkinson\u2019s Trust, PSP Association, CBD Solutions, Drake Foundation, Medical Research Council, Michael J Fox Foundation. H.R.M. is also a co-applicant on a patent application related to C9ORF72\u2014Method for diagnosing a neurodegenerative disease (PCT/GB2012/052140). The study\u2019s funders had no role in the study design, data collection, data analysis, data interpretation, or writing of the report. Authors M.B.M., A.D., I.V., E.H., D.S., S.B.C., J.J.K., M.B., W.N., R.H.C., S.H.H., J.A.B., J.F.C., M.M., D.W.C., K.V.K.-J, J.A.H., C.B., and A.B.S. declare no competing interests. All authors and the public can access all data and statistical programming code used in this project for the analyses and results generation. M.A.N. takes final responsibility for the decision to submit the paper for publication.
This work was supported in part by the Intramural Research Program of the National Institute on Aging and the National Institute of Neurological Disorders and Stroke (project number Z01-AG000949-02). Data used in the preparation of this article were obtained from the AMP PD Knowledge Platform. For up-to-date information on the study, please visit https://www.amp-pd.org. AMP PD?a public-private partnership?is managed by the FNIH and funded by Celgene, GSK, the Michael J. Fox Foundation for Parkinson?s Research, the National Institute of Neurological Disorders and Stroke, Pfizer, Sanofi, and Verily. Clinical data and biosamples used in the preparation of this article were obtained from the Parkinson?s Progression Markers Initiative (PPMI), and the Parkinson?s Disease Biomarkers Program (PDBP). PPMI?a public-private partnership?is funded by the Michael J. Fox Foundation for Parkinson?s Research and funding partners, including full names of all of the PPMI funding partners found at http://www.ppmi-info.org/fundingpartners. The PPMI Investigators have not participated in reviewing the data analysis or content of the manuscript. For up-to-date information on the study, visit http://www.ppmi-info.org. The Parkinson?s Disease Biomarker Program (PDBP) consortium is supported by the National Institute of Neurological Disorders and Stroke (NINDS) at the National Institutes of Health. A full list of PDBP investigators can be found at https://pdbp.ninds.nih.gov/policy. The PDBP Investigators have not participated in reviewing the data analysis or content of the manuscript. PDBP sample and clinical data collection is supported under grants by NINDS: U01NS082134, U01NS082157, U01NS082151, U01NS082137, U01NS082148, and U01NS082133. A portion of the resources used in the preparation of this article were obtained from Global Parkinson?s Genetics Program (GP2). GP2 is funded by the Aligning Science against Parkinson?s (ASAP) initiative and implemented by The Michael J. Fox Foundation for Parkinson?s Research (https://parkinsonsroadmap.org/gp2/). For a complete list of GP2 members, see https://parkinsonsroadmap.org/gp2/. The workflow diagram was created with BioRender.com.
PY - 2022/12
Y1 - 2022/12
N2 - Personalized medicine promises individualized disease prediction and treatment. The convergence of machine learning (ML) and available multimodal data is key moving forward. We build upon previous work to deliver multimodal predictions of Parkinson’s disease (PD) risk and systematically develop a model using GenoML, an automated ML package, to make improved multi-omic predictions of PD, validated in an external cohort. We investigated top features, constructed hypothesis-free disease-relevant networks, and investigated drug–gene interactions. We performed automated ML on multimodal data from the Parkinson’s progression marker initiative (PPMI). After selecting the best performing algorithm, all PPMI data was used to tune the selected model. The model was validated in the Parkinson’s Disease Biomarker Program (PDBP) dataset. Our initial model showed an area under the curve (AUC) of 89.72% for the diagnosis of PD. The tuned model was then tested for validation on external data (PDBP, AUC 85.03%). Optimizing thresholds for classification increased the diagnosis prediction accuracy and other metrics. Finally, networks were built to identify gene communities specific to PD. Combining data modalities outperforms the single biomarker paradigm. UPSIT and PRS contributed most to the predictive power of the model, but the accuracy of these are supplemented by many smaller effect transcripts and risk SNPs. Our model is best suited to identifying large groups of individuals to monitor within a health registry or biobank to prioritize for further testing. This approach allows complex predictive models to be reproducible and accessible to the community, with the package, code, and results publicly available.
AB - Personalized medicine promises individualized disease prediction and treatment. The convergence of machine learning (ML) and available multimodal data is key moving forward. We build upon previous work to deliver multimodal predictions of Parkinson’s disease (PD) risk and systematically develop a model using GenoML, an automated ML package, to make improved multi-omic predictions of PD, validated in an external cohort. We investigated top features, constructed hypothesis-free disease-relevant networks, and investigated drug–gene interactions. We performed automated ML on multimodal data from the Parkinson’s progression marker initiative (PPMI). After selecting the best performing algorithm, all PPMI data was used to tune the selected model. The model was validated in the Parkinson’s Disease Biomarker Program (PDBP) dataset. Our initial model showed an area under the curve (AUC) of 89.72% for the diagnosis of PD. The tuned model was then tested for validation on external data (PDBP, AUC 85.03%). Optimizing thresholds for classification increased the diagnosis prediction accuracy and other metrics. Finally, networks were built to identify gene communities specific to PD. Combining data modalities outperforms the single biomarker paradigm. UPSIT and PRS contributed most to the predictive power of the model, but the accuracy of these are supplemented by many smaller effect transcripts and risk SNPs. Our model is best suited to identifying large groups of individuals to monitor within a health registry or biobank to prioritize for further testing. This approach allows complex predictive models to be reproducible and accessible to the community, with the package, code, and results publicly available.
UR - http://www.scopus.com/inward/record.url?scp=85127675184&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85127675184&partnerID=8YFLogxK
U2 - 10.1038/s41531-022-00288-w
DO - 10.1038/s41531-022-00288-w
M3 - Article
C2 - 35365675
AN - SCOPUS:85127675184
SN - 2373-8057
VL - 8
JO - npj Parkinson's Disease
JF - npj Parkinson's Disease
IS - 1
M1 - 35
ER -