Using IBM’s Watson to automatically evaluate student short answer responses

Jennifer Campbell, Katie Ansell, Tim Stelzer

Research output: Chapter in Book/Report/Conference proceedingConference contribution


Recent advancements in natural language processing (NLP) have generated interest in using computers to assist in the coding and analysis of students’ short answer responses for PER or classroom applications. We train a state-of-the-art NLP, IBM’s Watson, and test its agreement with humans in three varying experimental cases. By exploring these cases, we begin to understand how Watson behaves with ideal and more realistic data, across different levels of training, and across different types of categorization tasks. We find that Watson’s self-reported confidence for categorizing samples is reasonably well-aligned with its accuracy, although this can be impacted by features of the data being analyzed. Based on these results, we discuss implications and suggest potential applications of this technology to education research.

Original languageEnglish (US)
Title of host publicationPhysics Education Research Conference, 2022
EditorsBrian Frank, Dyan Jones, Qing Ryan
PublisherAmerican Association of Physics Teachers
Number of pages6
ISBN (Print)9781931024389
StatePublished - 2022
EventPhysics Education Research Conference, PERC 2022 - Grand Rapids, United States
Duration: Jul 13 2022Jul 14 2022

Publication series

NamePhysics Education Research Conference Proceedings
ISSN (Print)1539-9028
ISSN (Electronic)2377-2379


ConferencePhysics Education Research Conference, PERC 2022
Country/TerritoryUnited States
CityGrand Rapids

ASJC Scopus subject areas

  • Education
  • Physics and Astronomy(all)


Dive into the research topics of 'Using IBM’s Watson to automatically evaluate student short answer responses'. Together they form a unique fingerprint.

Cite this