Predicting Readmission Charges Billed by Hospitals: Machine Learning Approach

Deepika Gopukumar, Abhijeet Ghoshal, Huimin Zhao

Research output: Contribution to journalArticlepeer-review


Background: The Centers for Medicare and Medicaid Services projects that health care costs will continue to grow over the next few years. Rising readmission costs contribute significantly to increasing health care costs. Multiple areas of health care, including readmissions, have benefited from the application of various machine learning algorithms in several ways. Objective: We aimed to identify suitable models for predicting readmission charges billed by hospitals. Our literature review revealed that this application of machine learning is underexplored. We used various predictive methods, ranging from glass-box models (such as regularization techniques) to black-box models (such as deep learning–based models). Methods: We defined readmissions as readmission with the same major diagnostic category (RSDC) and all-cause readmission category (RADC). For these readmission categories, 576,701 and 1,091,580 individuals, respectively, were identified from the Nationwide Readmission Database of the Healthcare Cost and Utilization Project by the Agency for Healthcare Research and Quality for 2013. Linear regression, lasso regression, elastic net, ridge regression, eXtreme gradient boosting (XGBoost), and a deep learning model based on multilayer perceptron (MLP) were the 6 machine learning algorithms we tested for RSDC and RADC through 10-fold cross-validation. Results: Our preliminary analysis using a data-driven approach revealed that within RADC, the subsequent readmission charge billed per patient was higher than the previous charge for 541,090 individuals, and this number was 319,233 for RSDC. The top 3 major diagnostic categories (MDCs) for such instances were the same for RADC and RSDC. The average readmission charge billed was higher than the previous charge for 21 of the MDCs in the case of RSDC, whereas it was only for 13 of the MDCs in RADC. We recommend XGBoost and the deep learning model based on MLP for predicting readmission charges. The following performance metrics were obtained for XGBoost: (1) RADC (mean absolute percentage error [MAPE]=3.121%; root mean squared error [RMSE]=0.414; mean absolute error [MAE]=0.317; root relative squared error [RRSE]=0.410; relative absolute error [RAE]=0.399; normalized RMSE [NRMSE]=0.040; mean absolute deviation [MAD]=0.031) and (2) RSDC (MAPE=3.171%; RMSE=0.421; MAE=0.321; RRSE=0.407; RAE=0.393; NRMSE=0.041; MAD=0.031). The performance obtained for MLP-based deep neural networks are as follows: (1) RADC (MAPE=3.103%; RMSE=0.413; MAE=0.316; RRSE=0.410; RAE=0.397; NRMSE=0.040; MAD=0.031) and (2) RSDC (MAPE=3.202%; RMSE=0.427; MAE=0.326; RRSE=0.413; RAE=0.399; NRMSE=0.041; MAD=0.032). Repeated measures ANOVA revealed that the mean RMSE differed significantly across models with P<.001. Post hoc tests using the Bonferroni correction method indicated that the mean RMSE of the deep learning/XGBoost models was statistically significantly (P<.001) lower than that of all other models, namely linear regression/elastic net/lasso/ridge regression. Conclusions: Models built using XGBoost and MLP are suitable for predicting readmission charges billed by hospitals. The MDCs allow models to accurately predict hospital readmission charges.

Original languageEnglish (US)
Article numbere37578
JournalJMIR Medical Informatics
Issue number8
StatePublished - Aug 1 2022
Externally publishedYes


  • machine learning
  • predictive analytics
  • predictive models
  • readmission analytics
  • readmission charges
  • readmissions

ASJC Scopus subject areas

  • Health Information Management
  • Health Informatics


Dive into the research topics of 'Predicting Readmission Charges Billed by Hospitals: Machine Learning Approach'. Together they form a unique fingerprint.

Cite this