TY - JOUR
T1 - Evaluating IBM's Watson natural language processing artificial intelligence as a short-answer categorization tool for physics education research
AU - Campbell, Jennifer
AU - Ansell, Katherine A
AU - Stelzer, Tim
N1 - Publisher Copyright:
© 2024 authors. Published by the American Physical Society. Published by the American Physical Society under the terms of the "https://creativecommons.org/licenses/by/4.0/"Creative Commons Attribution 4.0 International license. Further distribution of this work must maintain attribution to the author(s) and the published article's title, journal citation, and DOI.
PY - 2024/1
Y1 - 2024/1
N2 - Recent advances in publicly available natural language processors (NLP) may enhance the efficiency of analyzing student short-answer responses in physics education research (PER). We train a state-of-the-art NLP, IBM's Watson, and test its agreement with human coders using two different studies that gathered text responses in which students explain their reasoning on physics-related questions. The first study analyzes 479 student responses to a lab data analysis question and categorizes them by main idea. The second study analyzes 732 student answers to identify the presence or absence of each of the two conceptual themes. When training Watson with approximately one-third to half of the samples, we find that samples labeled with high confidence scores have similar accuracy to human agreement; yet for lower confidence scores, humans outperform the NLP's labeling accuracy. In addition to studying Watson's overall accuracy, we use this analysis to better understand factors that impact how Watson categorizes. Using the data from the categorization study, we find that Watson's algorithm does not appear to be impacted by the disproportionate representation of categories in the training set, and we examine mislabeled statements to identify vocabulary and phrasing that may increase the rate of false positives. Based on this work, we find that, with careful consideration of the research study design and an awareness of the NLP's limitations, Watson may present a useful tool for large-scale PER studies or classroom analysis tools.
AB - Recent advances in publicly available natural language processors (NLP) may enhance the efficiency of analyzing student short-answer responses in physics education research (PER). We train a state-of-the-art NLP, IBM's Watson, and test its agreement with human coders using two different studies that gathered text responses in which students explain their reasoning on physics-related questions. The first study analyzes 479 student responses to a lab data analysis question and categorizes them by main idea. The second study analyzes 732 student answers to identify the presence or absence of each of the two conceptual themes. When training Watson with approximately one-third to half of the samples, we find that samples labeled with high confidence scores have similar accuracy to human agreement; yet for lower confidence scores, humans outperform the NLP's labeling accuracy. In addition to studying Watson's overall accuracy, we use this analysis to better understand factors that impact how Watson categorizes. Using the data from the categorization study, we find that Watson's algorithm does not appear to be impacted by the disproportionate representation of categories in the training set, and we examine mislabeled statements to identify vocabulary and phrasing that may increase the rate of false positives. Based on this work, we find that, with careful consideration of the research study design and an awareness of the NLP's limitations, Watson may present a useful tool for large-scale PER studies or classroom analysis tools.
UR - http://www.scopus.com/inward/record.url?scp=85188663424&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85188663424&partnerID=8YFLogxK
U2 - 10.1103/PhysRevPhysEducRes.20.010116
DO - 10.1103/PhysRevPhysEducRes.20.010116
M3 - Article
AN - SCOPUS:85188663424
SN - 2469-9896
VL - 20
JO - Physical Review Physics Education Research
JF - Physical Review Physics Education Research
IS - 1
M1 - 010116
ER -