TY - GEN
T1 - Using LLM-Based Filtering to Develop Reliable Coding Schemes for Rare Debugging Strategies
AU - Fan, Aysa Xuemo
AU - Liu, Qianhui
AU - Paquette, Luc
AU - Pinto, Juan
N1 - This study is funded by National Science Foundation Award #1942962.
PY - 2024
Y1 - 2024
N2 - Identifying and annotating student use of debugging strategies when solving computer programming problems can be a meaningful tool for studying and better understanding the development of debugging skills, which may lead to the design of effective pedagogical interventions. However, this process can be challenging when dealing with large datasets, especially when the strategies of interest are rare but important. This difficulty lies not only in the scale of the dataset but also in operationalizing these rare phenomena within the data. Operationalization requires annotators to first define how these rare phenomena manifest in the data and then obtain a sufficient number of positive examples to validate that this definition is reliable by accurately measuring Inter-Rater Reliability (IRR). This paper presents a method that leverages Large Language Models (LLMs) to efficiently exclude computer programming episodes that are unlikely to exhibit a specific debugging strategy. By using LLMs to filter out irrelevant programming episodes, this method focuses human annotation efforts on the most pertinent parts of the dataset, enabling experts to operationalize the coding scheme and reach IRR more efficiently.
AB - Identifying and annotating student use of debugging strategies when solving computer programming problems can be a meaningful tool for studying and better understanding the development of debugging skills, which may lead to the design of effective pedagogical interventions. However, this process can be challenging when dealing with large datasets, especially when the strategies of interest are rare but important. This difficulty lies not only in the scale of the dataset but also in operationalizing these rare phenomena within the data. Operationalization requires annotators to first define how these rare phenomena manifest in the data and then obtain a sufficient number of positive examples to validate that this definition is reliable by accurately measuring Inter-Rater Reliability (IRR). This paper presents a method that leverages Large Language Models (LLMs) to efficiently exclude computer programming episodes that are unlikely to exhibit a specific debugging strategy. By using LLMs to filter out irrelevant programming episodes, this method focuses human annotation efforts on the most pertinent parts of the dataset, enabling experts to operationalize the coding scheme and reach IRR more efficiently.
KW - Inter-Rater Reliability
KW - Large Language Models
KW - Programming Education
UR - http://www.scopus.com/inward/record.url?scp=85208723277&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85208723277&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-76335-9_10
DO - 10.1007/978-3-031-76335-9_10
M3 - Conference contribution
AN - SCOPUS:85208723277
SN - 9783031763342
T3 - Communications in Computer and Information Science
SP - 136
EP - 151
BT - Advances in Quantitative Ethnography - 6th International Conference, ICQE 2024, Proceedings
A2 - Kim, Yoon Jeon
A2 - Swiecki, Zachari
PB - Springer
T2 - 6th International Conference on Quantitative Ethnography, ICQE 2024
Y2 - 3 November 2024 through 7 November 2024
ER -