Metrics including Cohen’s kappa, precision, recall, and F1 are common measures of performance for models of discrete student states, such as a student’s affect or behaviour. This study examined discrete model metrics for previously published student model examples to identify situations where metrics provided differing perspectives on model performance. Simulated models also systematically showed the effects of imbalanced class distributions in both data and predictions, in terms of the values of metrics and the chance levels (values obtained by making random predictions) for those metrics. Random chance level for F1 was also established and evaluated. Results for example student models showed that over-prediction of the class of interest (positive class) was relatively common. Chance-level F1 was inflated by over-prediction; conversely, maximum possible values for F1 and kappa were negatively impacted by over-prediction of the positive class. Additionally, normalization methods for F1 relative to chance are discussed and compared to kappa, demonstrating an equivalence between kappa and normalized F1. Finally, implications of results for choice of metrics are discussed in the context of common student modelling goals, such as avoiding false negatives for student states that are negatively related to learning.