Using Problem Similarity- and Order-based Weighting to Model Learner Performance in Introductory Computer Science Problems

Yingbin Zhang, Aysa Xuemo Fan, Juan D. Pinto, Luc Paquette

Research output: Contribution to journalArticlepeer-review


The second CSEDM data challenge aimed at finding innovative methods to use students’ programming traces to model their learning. The main challenge of this task is how to decide which past problems are relevant for predicting performance on a future problem. This paper proposes a set of weighting schemes to address this challenge. Specifically, students’ behaviors and performance on past problems were weighted in predicting performance on future problems. The weight of a past problem was proportional to its similarity with the future problem. Problem similarity was quantified in terms of source code, problem prompts, and struggling patterns. In addition, we considered another weighting scheme where past problems were weighted by the order in which students attempted them. Prior studies have used problem similarity and order information in learner modeling, but the proposed weighting schemes are more flexible in capturing problem similarity on various problem properties and weighting various behaviors and performance information on past problems. We systematically investigate the utility of the weighting schemes on performance prediction through two analyses. The first analysis found that the weighting schemes based on source code similarity, struggling pattern similarity, and problem order improved the prediction performance, but the weighting scheme based on problem prompts did not. The second analysis found that the weighting scheme allows a simple and interpretable model, such as logistic regression, to have performance comparable to state-of-the-art deep-learning models. We discussed the implications of the weighting schemes for learner modeling and suggested directions for further improvement.

Original languageEnglish (US)
Pages (from-to)63-99
Number of pages37
JournalJournal of Educational Data Mining
Issue number1
StatePublished - 2023


  • knowledge tracing
  • learner modeling
  • performance prediction
  • problem similarity
  • programming trace

ASJC Scopus subject areas

  • Education
  • Computer Science Applications
  • Artificial Intelligence


Dive into the research topics of 'Using Problem Similarity- and Order-based Weighting to Model Learner Performance in Introductory Computer Science Problems'. Together they form a unique fingerprint.

Cite this