TY - GEN
T1 - Residual-based Language Models are Free Boosters for Biomedical Imaging Tasks
AU - Lai, Zhixin
AU - Wu, Jing
AU - Chen, Suiyao
AU - Zhou, Yucheng
AU - Hovakimyan, Naira
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - In this study, we uncover the unexpected efficacy of residual-based large language models (LLMs) as part of encoders for biomedical imaging tasks, a domain traditionally devoid of language or textual data. The approach diverges from established methodologies by utilizing a frozen transformer block, extracted from pre-trained LLMs, as an innovative encoder layer for the direct processing of visual tokens. This strategy represents a significant departure from the standard multi-modal vision-language frameworks, which typically hinge on language-driven prompts and inputs. We found that these LLMs could boost performance across a spectrum of biomedical imaging applications, including both 2D and 3D visual classification tasks, serving as plug-and-play boosters. More interestingly, as a byproduct, we found that the proposed framework achieved superior performance, setting new state-of-the-art results on extensive, standardized datasets in MedMNIST-2D and 3D. Through this work, we aim to open new avenues for employing LLMs in biomedical imaging and enriching the understanding of their potential in this specialized domain. The code is available at https://github.com/ZhixinLai/LLMBoostMedical
AB - In this study, we uncover the unexpected efficacy of residual-based large language models (LLMs) as part of encoders for biomedical imaging tasks, a domain traditionally devoid of language or textual data. The approach diverges from established methodologies by utilizing a frozen transformer block, extracted from pre-trained LLMs, as an innovative encoder layer for the direct processing of visual tokens. This strategy represents a significant departure from the standard multi-modal vision-language frameworks, which typically hinge on language-driven prompts and inputs. We found that these LLMs could boost performance across a spectrum of biomedical imaging applications, including both 2D and 3D visual classification tasks, serving as plug-and-play boosters. More interestingly, as a byproduct, we found that the proposed framework achieved superior performance, setting new state-of-the-art results on extensive, standardized datasets in MedMNIST-2D and 3D. Through this work, we aim to open new avenues for employing LLMs in biomedical imaging and enriching the understanding of their potential in this specialized domain. The code is available at https://github.com/ZhixinLai/LLMBoostMedical
KW - Biomedical Imaging
KW - LLM
UR - http://www.scopus.com/inward/record.url?scp=85202595241&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85202595241&partnerID=8YFLogxK
U2 - 10.1109/CVPRW63382.2024.00515
DO - 10.1109/CVPRW63382.2024.00515
M3 - Conference contribution
AN - SCOPUS:85202595241
T3 - IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops
SP - 5086
EP - 5096
BT - Proceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2024
PB - IEEE Computer Society
T2 - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2024
Y2 - 16 June 2024 through 22 June 2024
ER -