TY - GEN
T1 - Am I Wrong, or Is the Autograder Wrong? Effects of AI Grading Mistakes on Learning
AU - Li, Tiffany Wenting
AU - Hsu, Silas
AU - Fowler, Max
AU - Zhang, Zhilin
AU - Zilles, Craig
AU - Karahalios, Karrie
N1 - A big thanks to the students that participated in our study. Additional thanks to the members of the Social Spaces group for their writing suggestions and feedback. This work was supported by the NSF (DUE 21-21424, NSF IIS-2016908), Capital One, and the UIUC College of Engineering Strategic Research Initiatives (SRI) grant. Tiffany Wenting Li was additionally supported by the Google Ph.D. fellowship.
PY - 2023/8/7
Y1 - 2023/8/7
N2 - Errors in AI grading and feedback often have an intractable set of causes and are, by their nature, difficult to completely avoid. Since inaccurate feedback potentially harms learning, there is a need for designs and workflows that mitigate these harms. To better understand the mechanisms by which erroneous AI feedback impacts students' learning, we conducted surveys and interviews that recorded students' interactions with a short-answer AI autograder for "Explain in Plain English"code reading problems. Using causal modeling, we inferred the learning impacts of wrong answers marked as right (false positives, FPs) and right answers marked as wrong (false negatives, FNs). We further explored explanations for the learning impacts, including errors influencing participants' engagement with feedback and assessments of their answers' correctness, and participants' prior performance in the class. FPs harmed learning in large part due to participants' failures to detect the errors. This was due to participants not paying attention to the feedback after being marked as right, and an apparent bias against admitting one's answer was wrong once marked right. On the other hand, FNs harmed learning only for survey participants, suggesting that interviewees' greater behavioral and cognitive engagement protected them from learning harms. Based on these findings, we propose ways to help learners detect FPs and encourage deeper reflection on FNs to mitigate the learning harms of AI errors.
AB - Errors in AI grading and feedback often have an intractable set of causes and are, by their nature, difficult to completely avoid. Since inaccurate feedback potentially harms learning, there is a need for designs and workflows that mitigate these harms. To better understand the mechanisms by which erroneous AI feedback impacts students' learning, we conducted surveys and interviews that recorded students' interactions with a short-answer AI autograder for "Explain in Plain English"code reading problems. Using causal modeling, we inferred the learning impacts of wrong answers marked as right (false positives, FPs) and right answers marked as wrong (false negatives, FNs). We further explored explanations for the learning impacts, including errors influencing participants' engagement with feedback and assessments of their answers' correctness, and participants' prior performance in the class. FPs harmed learning in large part due to participants' failures to detect the errors. This was due to participants not paying attention to the feedback after being marked as right, and an apparent bias against admitting one's answer was wrong once marked right. On the other hand, FNs harmed learning only for survey participants, suggesting that interviewees' greater behavioral and cognitive engagement protected them from learning harms. Based on these findings, we propose ways to help learners detect FPs and encourage deeper reflection on FNs to mitigate the learning harms of AI errors.
KW - AI error
KW - Bayesian modeling
KW - EiPE
KW - autograder
KW - automated short answer grading
KW - computer science education
KW - explain in plain English
KW - formative feedback
KW - human-AI interaction
UR - http://www.scopus.com/inward/record.url?scp=85174290094&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85174290094&partnerID=8YFLogxK
U2 - 10.1145/3568813.3600124
DO - 10.1145/3568813.3600124
M3 - Conference contribution
AN - SCOPUS:85174290094
T3 - ICER 2023 - Proceedings of the 2023 ACM Conference on International Computing Education Research V.1
SP - 159
EP - 176
BT - ICER 2023 - Proceedings of the 2023 ACM Conference on International Computing Education Research V.1
PB - Association for Computing Machinery
T2 - 19th Annual ACM International Computing Education Research Conference, ICER 2023
Y2 - 7 August 2023 through 11 August 2023
ER -