TY - JOUR
T1 - Complementary and Integrative Health Information in the literature: its lexicon and named entity recognition
AU - Zhou, Huixue
AU - Austin, Robin
AU - Lu, Sheng-Chieh
AU - Silverman, Greg Marc
AU - Zhou, Yuqi
AU - Kilicoglu, Halil
AU - Xu, Hua
AU - Zhang, Rui
N1 - This work was supported by National Center for Complementary and Integrative Health (NCCIH) (grant number R01AT009457) and National Institution on Aging (NIA) (grant number R01AG078154). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
PY - 2024/2/1
Y1 - 2024/2/1
N2 - Objective: To construct an exhaustive Complementary and Integrative Health (CIH) Lexicon (CIHLex) to help better represent the often underrepresented physical and psychological CIH approaches in standard terminologies, and to also apply state-of-the-art natural language processing (NLP) techniques to help recognize them in the biomedical literature. Materials and methods: We constructed the CIHLex by integrating various resources, compiling and integrating data from biomedical literature and relevant sources of knowledge. The Lexicon encompasses 724 unique concepts with 885 corresponding unique terms. We matched these concepts to the Unified Medical Language System (UMLS), and we developed and utilized BERT models comparing their efficiency in CIH named entity recognition to well-established models including MetaMap and CLAMP, as well as the large language model GPT3.5-turbo. Results: Of the 724 unique concepts in CIHLex, 27.2% could be matched to at least one term in the UMLS. About 74.9% of the mapped UMLS Concept Unique Identifiers were categorized as "Therapeutic or Preventive Procedure."Among the models applied to CIH named entity recognition, BLUEBERT delivered the highest macro-average F1-score of 0.91, surpassing other models. Conclusion: Our CIHLex significantly augments representation of CIH approaches in biomedical literature. Demonstrating the utility of advanced NLP models, BERT notably excelled in CIH entity recognition. These results highlight promising strategies for enhancing standardization and recognition of CIH terminology in biomedical contexts.
AB - Objective: To construct an exhaustive Complementary and Integrative Health (CIH) Lexicon (CIHLex) to help better represent the often underrepresented physical and psychological CIH approaches in standard terminologies, and to also apply state-of-the-art natural language processing (NLP) techniques to help recognize them in the biomedical literature. Materials and methods: We constructed the CIHLex by integrating various resources, compiling and integrating data from biomedical literature and relevant sources of knowledge. The Lexicon encompasses 724 unique concepts with 885 corresponding unique terms. We matched these concepts to the Unified Medical Language System (UMLS), and we developed and utilized BERT models comparing their efficiency in CIH named entity recognition to well-established models including MetaMap and CLAMP, as well as the large language model GPT3.5-turbo. Results: Of the 724 unique concepts in CIHLex, 27.2% could be matched to at least one term in the UMLS. About 74.9% of the mapped UMLS Concept Unique Identifiers were categorized as "Therapeutic or Preventive Procedure."Among the models applied to CIH named entity recognition, BLUEBERT delivered the highest macro-average F1-score of 0.91, surpassing other models. Conclusion: Our CIHLex significantly augments representation of CIH approaches in biomedical literature. Demonstrating the utility of advanced NLP models, BERT notably excelled in CIH entity recognition. These results highlight promising strategies for enhancing standardization and recognition of CIH terminology in biomedical contexts.
KW - Complementary and Integrative Health
KW - named entity recognition
KW - Unified Medical Language System
KW - terminology
UR - http://www.scopus.com/inward/record.url?scp=85182779684&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85182779684&partnerID=8YFLogxK
U2 - 10.1093/jamia/ocad216
DO - 10.1093/jamia/ocad216
M3 - Article
C2 - 37952122
SN - 1527-974X
VL - 31
SP - 426
EP - 434
JO - Journal of the American Medical Informatics Association
JF - Journal of the American Medical Informatics Association
IS - 2
M1 - ocad216
ER -