TY - JOUR
T1 - DRG-LLaMA
T2 - tuning LLaMA model to predict diagnosis-related group for hospitalized patients
AU - Wang, Hanyin
AU - Gao, Chufan
AU - Dantona, Christopher
AU - Hull, Bryan
AU - Sun, Jimeng
N1 - This research was supported by NSF award SCH-2205289, IIS-2034479, SCH-2014438. The funder played no role in the study design, data collection, analysis, and interpretation of data, or the writing of this manuscript.
PY - 2024/12
Y1 - 2024/12
N2 - In the U.S. inpatient payment system, the Diagnosis-Related Group (DRG) is pivotal, but its assignment process is inefficient. The study introduces DRG-LLaMA, an advanced large language model (LLM) fine-tuned on clinical notes to enhance DRGs assignment. Utilizing LLaMA as the foundational model and optimizing it through Low-Rank Adaptation (LoRA) on 236,192 MIMIC-IV discharge summaries, our DRG-LLaMA -7B model exhibited a noteworthy macro-averaged F1 score of 0.327, a top-1 prediction accuracy of 52.0%, and a macro-averaged Area Under the Curve (AUC) of 0.986, with a maximum input token length of 512. This model surpassed the performance of prior leading models in DRG prediction, showing a relative improvement of 40.3% and 35.7% in macro-averaged F1 score compared to ClinicalBERT and CAML, respectively. Applied to base DRG and complication or comorbidity (CC)/major complication or comorbidity (MCC) prediction, DRG-LLaMA achieved a top-1 prediction accuracy of 67.8% and 67.5%, respectively. Additionally, our findings indicate that DRG-LLaMA ’s performance correlates with increased model parameters and input context lengths.
AB - In the U.S. inpatient payment system, the Diagnosis-Related Group (DRG) is pivotal, but its assignment process is inefficient. The study introduces DRG-LLaMA, an advanced large language model (LLM) fine-tuned on clinical notes to enhance DRGs assignment. Utilizing LLaMA as the foundational model and optimizing it through Low-Rank Adaptation (LoRA) on 236,192 MIMIC-IV discharge summaries, our DRG-LLaMA -7B model exhibited a noteworthy macro-averaged F1 score of 0.327, a top-1 prediction accuracy of 52.0%, and a macro-averaged Area Under the Curve (AUC) of 0.986, with a maximum input token length of 512. This model surpassed the performance of prior leading models in DRG prediction, showing a relative improvement of 40.3% and 35.7% in macro-averaged F1 score compared to ClinicalBERT and CAML, respectively. Applied to base DRG and complication or comorbidity (CC)/major complication or comorbidity (MCC) prediction, DRG-LLaMA achieved a top-1 prediction accuracy of 67.8% and 67.5%, respectively. Additionally, our findings indicate that DRG-LLaMA ’s performance correlates with increased model parameters and input context lengths.
UR - http://www.scopus.com/inward/record.url?scp=85182723058&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85182723058&partnerID=8YFLogxK
U2 - 10.1038/s41746-023-00989-3
DO - 10.1038/s41746-023-00989-3
M3 - Article
C2 - 38253711
AN - SCOPUS:85182723058
SN - 2398-6352
VL - 7
JO - npj Digital Medicine
JF - npj Digital Medicine
IS - 1
M1 - 16
ER -