Using LLM-Based Filtering to Develop Reliable Coding Schemes for Rare Debugging Strategies

Aysa Xuemo Fan, Qianhui Liu, Luc Paquette, Juan Pinto

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Identifying and annotating student use of debugging strategies when solving computer programming problems can be a meaningful tool for studying and better understanding the development of debugging skills, which may lead to the design of effective pedagogical interventions. However, this process can be challenging when dealing with large datasets, especially when the strategies of interest are rare but important. This difficulty lies not only in the scale of the dataset but also in operationalizing these rare phenomena within the data. Operationalization requires annotators to first define how these rare phenomena manifest in the data and then obtain a sufficient number of positive examples to validate that this definition is reliable by accurately measuring Inter-Rater Reliability (IRR). This paper presents a method that leverages Large Language Models (LLMs) to efficiently exclude computer programming episodes that are unlikely to exhibit a specific debugging strategy. By using LLMs to filter out irrelevant programming episodes, this method focuses human annotation efforts on the most pertinent parts of the dataset, enabling experts to operationalize the coding scheme and reach IRR more efficiently.

Original languageEnglish (US)
Title of host publicationAdvances in Quantitative Ethnography - 6th International Conference, ICQE 2024, Proceedings
EditorsYoon Jeon Kim, Zachari Swiecki
PublisherSpringer
Pages136-151
Number of pages16
ISBN (Print)9783031763342
DOIs
StatePublished - 2024
Event6th International Conference on Quantitative Ethnography, ICQE 2024 - Philadelphia, United States
Duration: Nov 3 2024Nov 7 2024

Publication series

NameCommunications in Computer and Information Science
Volume2278 CCIS
ISSN (Print)1865-0929
ISSN (Electronic)1865-0937

Conference

Conference6th International Conference on Quantitative Ethnography, ICQE 2024
Country/TerritoryUnited States
CityPhiladelphia
Period11/3/2411/7/24

Keywords

  • Inter-Rater Reliability
  • Large Language Models
  • Programming Education

ASJC Scopus subject areas

  • General Computer Science
  • General Mathematics

Fingerprint

Dive into the research topics of 'Using LLM-Based Filtering to Develop Reliable Coding Schemes for Rare Debugging Strategies'. Together they form a unique fingerprint.

Cite this