TY - JOUR
T1 - CARES
T2 - 38th Conference on Neural Information Processing Systems, NeurIPS 2024
AU - Xia, Peng
AU - Chen, Ze
AU - Tian, Juanxi
AU - Gong, Yangrui
AU - Hou, Ruibo
AU - Xu, Yue
AU - Wu, Zhenbang
AU - Fan, Zhiyuan
AU - Zhou, Yiyang
AU - Zhu, Kangyu
AU - Zheng, Wenhao
AU - Wang, Zhaoyang
AU - Wang, Xiao
AU - Zhang, Xuchao
AU - Bansal, Chetan
AU - Niethammer, Marc
AU - Huang, Junzhou
AU - Zhu, Hongtu
AU - Li, Yun
AU - Sun, Jimeng
AU - Ge, Zongyuan
AU - Li, Gang
AU - Zou, James
AU - Yao, Huaxiu
N1 - Publisher Copyright:
© 2024 Neural information processing systems foundation. All rights reserved.
PY - 2024
Y1 - 2024
N2 - Artificial intelligence has significantly impacted medical applications, particularly with the advent of Medical Large Vision Language Models (Med-LVLMs), sparking optimism for the future of automated and personalized healthcare. However, the trustworthiness of Med-LVLMs remains unverified, posing significant risks for future model deployment. In this paper, we introduce CARES and aim to Comprehensively evAluate the tRustworthinESs of Med-LVLMs across the medical domain. We assess the trustworthiness of Med-LVLMs across five dimensions, including trustfulness, fairness, safety, privacy, and robustness. CARES comprises about 41K question-answer pairs in both closed and open-ended formats, covering 16 medical image modalities and 27 anatomical regions. Our analysis reveals that the models consistently exhibit concerns regarding trustworthiness, often displaying factual inaccuracies and failing to maintain fairness across different demographic groups. Furthermore, they are vulnerable to attacks and demonstrate a lack of privacy awareness. We publicly release our benchmark and code in https://cares-ai.github.io/.
AB - Artificial intelligence has significantly impacted medical applications, particularly with the advent of Medical Large Vision Language Models (Med-LVLMs), sparking optimism for the future of automated and personalized healthcare. However, the trustworthiness of Med-LVLMs remains unverified, posing significant risks for future model deployment. In this paper, we introduce CARES and aim to Comprehensively evAluate the tRustworthinESs of Med-LVLMs across the medical domain. We assess the trustworthiness of Med-LVLMs across five dimensions, including trustfulness, fairness, safety, privacy, and robustness. CARES comprises about 41K question-answer pairs in both closed and open-ended formats, covering 16 medical image modalities and 27 anatomical regions. Our analysis reveals that the models consistently exhibit concerns regarding trustworthiness, often displaying factual inaccuracies and failing to maintain fairness across different demographic groups. Furthermore, they are vulnerable to attacks and demonstrate a lack of privacy awareness. We publicly release our benchmark and code in https://cares-ai.github.io/.
UR - http://www.scopus.com/inward/record.url?scp=105000493321&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=105000493321&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:105000493321
SN - 1049-5258
VL - 37
JO - Advances in Neural Information Processing Systems
JF - Advances in Neural Information Processing Systems
Y2 - 9 December 2024 through 15 December 2024
ER -