TY - JOUR
T1 - BiomedRAG
T2 - A retrieval augmented large language model for biomedicine
AU - Li, Mingchen
AU - Kilicoglu, Halil
AU - Xu, Hua
AU - Zhang, Rui
N1 - This work was partily supported by the National Institutes of Health's National Center for Complementary and Integrative Health grant number R01AT009457, National Institute on Aging grant number R01AG078154 and National Cancer Institute grant number R01CA287413. The content is solely the responsibility of the authors and does not represent the official views of the National Institutes of Health. We thank support from the UMN's Center for Leanring Health System Sciences. We thank Huixue Zhou for suggesting revisions to the method section and Chad Dupuis for solving the issues with our GPU server.
This work was supported by the National Institutes of Health\u2019s National Center for Complementary and Integrative Health grant number R01AT009457 , National Institute on Aging grant number R01AG078154 and National Cancer Institute grant number R01CA287413 . The content is solely the responsibility of the authors and does not represent the official views of the National Institutes of Health. We thanks support from UMN\u2019s Center for Leanring Health System Sciences. Thanks to Huixue Zhou for suggesting revisions to the method section. Thanks to Chad Dupuis for solving the issues with our GPU server.
PY - 2025/2
Y1 - 2025/2
N2 - Retrieval-augmented generation (RAG) involves a solution by retrieving knowledge from an established database to enhance the performance of large language models (LLM)., these models retrieve information at the sentence or paragraph level, potentially introducing noise and affecting the generation quality. To address these issues, we propose a novel BiomedRAG framework that directly feeds automatically retrieved chunk-based documents into the LLM. Our evaluation of BiomedRAG across four biomedical natural language processing tasks using eight datasets demonstrates that our proposed framework not only improves the performance by 9.95% on average, but also achieves state-of-the-art results, surpassing various baselines by 4.97%. BiomedRAG paves the way for more accurate and adaptable LLM applications in the biomedical domain.
AB - Retrieval-augmented generation (RAG) involves a solution by retrieving knowledge from an established database to enhance the performance of large language models (LLM)., these models retrieve information at the sentence or paragraph level, potentially introducing noise and affecting the generation quality. To address these issues, we propose a novel BiomedRAG framework that directly feeds automatically retrieved chunk-based documents into the LLM. Our evaluation of BiomedRAG across four biomedical natural language processing tasks using eight datasets demonstrates that our proposed framework not only improves the performance by 9.95% on average, but also achieves state-of-the-art results, surpassing various baselines by 4.97%. BiomedRAG paves the way for more accurate and adaptable LLM applications in the biomedical domain.
KW - Large language model
KW - Retrieval-augmented generation
UR - http://www.scopus.com/inward/record.url?scp=85215130610&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85215130610&partnerID=8YFLogxK
U2 - 10.1016/j.jbi.2024.104769
DO - 10.1016/j.jbi.2024.104769
M3 - Article
C2 - 39814274
AN - SCOPUS:85215130610
SN - 1532-0464
VL - 162
JO - Journal of Biomedical Informatics
JF - Journal of Biomedical Informatics
M1 - 104769
ER -