TY - GEN
T1 - Towards Adapting Open-Source Large Language Models for Expert-Level Clinical Note Generation
AU - Wang, Hanyin
AU - Gao, Chufan
AU - Liu, Bolun
AU - Xu, Qiping
AU - Hussein, Guleid
AU - Labban, Mohamad El
AU - Iheasirim, Kingsley
AU - Korsapati, Hariprasad
AU - Outcalt, Chuck
AU - Sun, Jimeng
N1 - Publisher Copyright:
© 2025 Association for Computational Linguistics.
PY - 2025
Y1 - 2025
N2 - Proprietary Large Language Models (LLMs) such as GPT-4 and Gemini have demonstrated promising capabilities in clinical text summarization tasks. However, due to patient data privacy concerns and computational costs, many healthcare providers prefer using small, locally-hosted models over external generic LLMs. This study presents a comprehensive domain- and task-specific adaptation process for the open-source LLaMA-2 13 billion parameter model, enabling it to generate high-quality clinical notes from outpatient patient-doctor dialogues. Our process incorporates continued pretraining, supervised fine-tuning, and reinforcement learning from both AI and human feedback. We introduced a new approach, DistillDirect, for performing on-policy reinforcement learning with Gemini 1.0 Pro as the teacher model. Our resulting model, LLaMA-Clinic, can generate clinical notes comparable in quality to those authored by physicians. In a blinded physician reader study, the majority (92.8%) of individual evaluations rated the notes generated by LLaMA-Clinic as “acceptable” or higher across three criteria: real-world readiness, completeness, and accuracy. In the more challenging “Assessment and Plan” section, LLaMA-Clinic matched physician-authored notes in real-world readiness score. We highlight key considerations for future clinical note-generation tasks, emphasizing the importance of pre-defining a “best practice” note format, rather than relying on LLMs to determine this for clinical practice.
AB - Proprietary Large Language Models (LLMs) such as GPT-4 and Gemini have demonstrated promising capabilities in clinical text summarization tasks. However, due to patient data privacy concerns and computational costs, many healthcare providers prefer using small, locally-hosted models over external generic LLMs. This study presents a comprehensive domain- and task-specific adaptation process for the open-source LLaMA-2 13 billion parameter model, enabling it to generate high-quality clinical notes from outpatient patient-doctor dialogues. Our process incorporates continued pretraining, supervised fine-tuning, and reinforcement learning from both AI and human feedback. We introduced a new approach, DistillDirect, for performing on-policy reinforcement learning with Gemini 1.0 Pro as the teacher model. Our resulting model, LLaMA-Clinic, can generate clinical notes comparable in quality to those authored by physicians. In a blinded physician reader study, the majority (92.8%) of individual evaluations rated the notes generated by LLaMA-Clinic as “acceptable” or higher across three criteria: real-world readiness, completeness, and accuracy. In the more challenging “Assessment and Plan” section, LLaMA-Clinic matched physician-authored notes in real-world readiness score. We highlight key considerations for future clinical note-generation tasks, emphasizing the importance of pre-defining a “best practice” note format, rather than relying on LLMs to determine this for clinical practice.
UR - https://www.scopus.com/pages/publications/105028578476
UR - https://www.scopus.com/pages/publications/105028578476#tab=citedBy
U2 - 10.18653/v1/2025.findings-acl.626
DO - 10.18653/v1/2025.findings-acl.626
M3 - Conference contribution
AN - SCOPUS:105028578476
T3 - Proceedings of the Annual Meeting of the Association for Computational Linguistics
SP - 12084
EP - 12117
BT - Findings of the Association for Computational Linguistics
A2 - Che, Wanxiang
A2 - Nabende, Joyce
A2 - Shutova, Ekaterina
A2 - Pilehvar, Mohammad Taher
PB - Association for Computational Linguistics (ACL)
T2 - 63rd Annual Meeting of the Association for Computational Linguistics, ACL 2025
Y2 - 27 July 2025 through 1 August 2025
ER -