TY - JOUR
T1 - Metrics for Discrete Student Models
T2 - Chance Levels, Comparisons, and Use Cases
AU - Bosch, Nigel
AU - Paquette, Luc
PY - 2018
Y1 - 2018
N2 - Metrics including Cohen’s kappa, precision, recall, and F1 are common measures of performance for models of discrete student states, such as a student’s affect or behaviour. This study examined discrete model metrics for previously published student model examples to identify situations where metrics provided differing perspectives on model performance. Simulated models also systematically showed the effects of imbalanced class distributions in both data and predictions, in terms of the values of metrics and the chance levels (values obtained by making random predictions) for those metrics. Random chance level for F1 was also established and evaluated. Results for example student models showed that over-prediction of the class of interest (positive class) was relatively common. Chance-level F1 was inflated by over-prediction; conversely, maximum possible values for F1 and kappa were negatively impacted by over-prediction of the positive class. Additionally, normalization methods for F1 relative to chance are discussed and compared to kappa, demonstrating an equivalence between kappa and normalized F1. Finally, implications of results for choice of metrics are discussed in the context of common student modelling goals, such as avoiding false negatives for student states that are negatively related to learning.
AB - Metrics including Cohen’s kappa, precision, recall, and F1 are common measures of performance for models of discrete student states, such as a student’s affect or behaviour. This study examined discrete model metrics for previously published student model examples to identify situations where metrics provided differing perspectives on model performance. Simulated models also systematically showed the effects of imbalanced class distributions in both data and predictions, in terms of the values of metrics and the chance levels (values obtained by making random predictions) for those metrics. Random chance level for F1 was also established and evaluated. Results for example student models showed that over-prediction of the class of interest (positive class) was relatively common. Chance-level F1 was inflated by over-prediction; conversely, maximum possible values for F1 and kappa were negatively impacted by over-prediction of the positive class. Additionally, normalization methods for F1 relative to chance are discussed and compared to kappa, demonstrating an equivalence between kappa and normalized F1. Finally, implications of results for choice of metrics are discussed in the context of common student modelling goals, such as avoiding false negatives for student states that are negatively related to learning.
U2 - 10.18608/jla.2018.52.6
DO - 10.18608/jla.2018.52.6
M3 - Article
SN - 1929-7750
VL - 5
SP - 86
EP - 104
JO - Journal of Learning Analytics
JF - Journal of Learning Analytics
IS - 2
ER -