Evaluating IBM's Watson natural language processing artificial intelligence as a short-answer categorization tool for physics education research

Jennifer Campbell, Katherine A Ansell, Tim Stelzer

Research output: Contribution to journalArticlepeer-review

Abstract

Recent advances in publicly available natural language processors (NLP) may enhance the efficiency of analyzing student short-answer responses in physics education research (PER). We train a state-of-the-art NLP, IBM's Watson, and test its agreement with human coders using two different studies that gathered text responses in which students explain their reasoning on physics-related questions. The first study analyzes 479 student responses to a lab data analysis question and categorizes them by main idea. The second study analyzes 732 student answers to identify the presence or absence of each of the two conceptual themes. When training Watson with approximately one-third to half of the samples, we find that samples labeled with high confidence scores have similar accuracy to human agreement; yet for lower confidence scores, humans outperform the NLP's labeling accuracy. In addition to studying Watson's overall accuracy, we use this analysis to better understand factors that impact how Watson categorizes. Using the data from the categorization study, we find that Watson's algorithm does not appear to be impacted by the disproportionate representation of categories in the training set, and we examine mislabeled statements to identify vocabulary and phrasing that may increase the rate of false positives. Based on this work, we find that, with careful consideration of the research study design and an awareness of the NLP's limitations, Watson may present a useful tool for large-scale PER studies or classroom analysis tools.

Original languageEnglish (US)
Article number010116
JournalPhysical Review Physics Education Research
Volume20
Issue number1
DOIs
StatePublished - Jan 2024

ASJC Scopus subject areas

  • Education
  • General Physics and Astronomy

Fingerprint

Dive into the research topics of 'Evaluating IBM's Watson natural language processing artificial intelligence as a short-answer categorization tool for physics education research'. Together they form a unique fingerprint.

Cite this