An Approach to Improve k-Anonymization Practices in Educational Data Mining

Frank Stinar, Zihan Xiong, Nigel Bosch

Research output: Contribution to journalArticlepeer-review


Educational data mining has allowed for large improvements in educational outcomes and understanding of educational processes. However, there remains a constant tension between educational data mining advances and protecting student privacy while using educational datasets. Publicly available datasets have facilitated numerous research projects while striving to preserve student privacy via strict anonymization protocols (e.g., k-anonymity); however, little is known about the relationship between anonymization and utility of educational datasets for downstream educational data mining tasks, nor how anonymization processes might be improved for such tasks. We provide a framework for strictly anonymizing educational datasets with a focus on improving downstream performance in common tasks such as student outcome prediction. We evaluate our anonymization framework on five diverse educational datasets with machine learning-based downstream task examples to demonstrate both the effect of anonymization and our means to improve it. Our method improves downstream machine learning accuracy versus baseline data anonymization by 30.59%, on average, by guiding the anonymization process toward strategies that anonymize the least important information while leaving the most valuable information intact.

Original languageEnglish (US)
Pages (from-to)61-83
Number of pages23
JournalJournal of Educational Data Mining
Issue number1
StatePublished - 2024


  • data sharing
  • machine learning
  • student privacy

ASJC Scopus subject areas

  • Education
  • Computer Science Applications
  • Artificial Intelligence


Dive into the research topics of 'An Approach to Improve k-Anonymization Practices in Educational Data Mining'. Together they form a unique fingerprint.

Cite this