Evaluating Large Language Model Code Generation as an Autograding Mechanism for "Explain in Plain English" Questions

David H. Smith, Craig Zilles

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The ability of students to "Explain in Plain English"(EiPE) the purpose of code is a critical skill for students in introductory programming courses to develop. EiPE questions serve as both a mechanism for students to develop and demonstrate code comprehension skills. However, evaluating this skill has been challenging as manual grading is time consuming and not easily automated. The process of constructing a prompt for the purposes of code generation for a Large Language Model, such OpenAI's GPT-4, bears a striking resemblance to constructing EiPE responses. In this paper, we explore the potential of using test cases run on code generated by GPT-4 from students' EiPE responses as a grading mechanism for EiPE questions. We applied this proposed grading method to a corpus of EiPE responses collected from past exams, then measured agreement between the results of this grading method and human graders. Overall, we find moderate agreement between the human raters and the results of the unit tests run on the generated code. This appears to be attributable to GPT-4's code generation being more lenient than human graders on low-level descriptions of code.

Original languageEnglish (US)
Title of host publicationSIGCSE 2024 - Proceedings of the 55th ACM Technical Symposium on Computer Science Education
PublisherAssociation for Computing Machinery
Pages1824-1825
Number of pages2
ISBN (Electronic)9798400704246
DOIs
StatePublished - Mar 14 2024
Event55th ACM Technical Symposium on Computer Science Education, SIGCSE 2024 - Portland, United States
Duration: Mar 20 2024Mar 23 2024

Publication series

NameSIGCSE 2024 - Proceedings of the 55th ACM Technical Symposium on Computer Science Education
Volume2

Conference

Conference55th ACM Technical Symposium on Computer Science Education, SIGCSE 2024
Country/TerritoryUnited States
CityPortland
Period3/20/243/23/24

Keywords

  • EIPE
  • GPT-4
  • autograding
  • large language models

ASJC Scopus subject areas

  • General Computer Science
  • Education

Fingerprint

Dive into the research topics of 'Evaluating Large Language Model Code Generation as an Autograding Mechanism for "Explain in Plain English" Questions'. Together they form a unique fingerprint.

Cite this