Short answer scoring with GPT-4

Lan Jiang, Nigel Bosch

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Automatic short-answer scoring is a long-standing research problem in education. However, assessing short answers at human-level accuracy requires a deep understanding of natural language. Given the notable abilities of recent generative pre-trained transformer (GPT) models, we investigate gpt-4-1106-preview to automatically score student responses from the Automated Student Assessment Prize Short Answer Scoring dataset. We systematically varied information given to the model including possible correct answers and scoring examples, as well as the order of sub-tasks within short answer scoring (e.g., assigning a score vs. generating a rationale for an assigned score) to understand what affects short answer scoring. With the best configuration, GPT-4 yielded a quadratic weighted kappa of .677 across 10 questions. However, we observe that the performance differs across educational subjects (e.g., biology, English), the quality of scoring rubrics might affect the predictions, and the overall utility of rationales generated to explain scores is uncertain.

Original languageEnglish (US)
Title of host publicationL@S 2024 - Proceedings of the 11th ACM Conference on Learning @ Scale
PublisherAssociation for Computing Machinery
Pages438-442
Number of pages5
ISBN (Electronic)9798400706332
DOIs
StatePublished - Jul 9 2024
Event11th ACM Conference on Learning @ Scale, L@S 2024 - Atlanta, United States
Duration: Jul 18 2024Jul 20 2024

Publication series

NameL@S 2024 - Proceedings of the 11th ACM Conference on Learning @ Scale

Conference

Conference11th ACM Conference on Learning @ Scale, L@S 2024
Country/TerritoryUnited States
CityAtlanta
Period7/18/247/20/24

Keywords

  • gpt (generative pre-trained transformer)
  • short answer scoring
  • text classification

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Software

Fingerprint

Dive into the research topics of 'Short answer scoring with GPT-4'. Together they form a unique fingerprint.

Cite this