“Hello, [REDACTED]”: Protecting Student Privacy in Analyses of Online Discussion Forums

Nigel Bosch, R. Wes Crues, Najmuddin Shaik, Luc Paquette

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Online courses often include discussion forums, which provide a rich source of data to better understand and improve students’ learning experiences. However, forum messages frequently contain private information that prevents researchers from analyzing these data. We present a method for discovering and redacting private information including names, nicknames, employers, hometowns, and contact information. The method utilizes set operations to restrict the list of words that might be private information, which are then confirmed as private or not private via manual annotation or machine learning. To test the method, two raters manually annotated a corpus of words from an online course’s discussion forum. We then trained an ensemble machine learning model to automate the annotation task, achieving 95.4% recall and.979 AUC (area under the receiver operating characteristic curve) on a held-out dataset obtained from the same course offered 2 years later, and 97.0% recall and.956 AUC on a held-out dataset from a different online course. This work was motivated by research questions about students’ interactions with online courses that proved unanswerable without access to anonymized forum data, which we discuss. Finally, we queried two online course instructors about their perspectives on this work, and provide their perspectives on additional potential applications.

Original languageEnglish (US)
Title of host publicationProceedings of the 13th International Conference on Educational Data Mining, EDM 2020
EditorsAnna N. Rafferty, Jacob Whitehill, Cristobal Romero, Violetta Cavalli-Sforza
PublisherInternational Educational Data Mining Society
Pages39-49
Number of pages11
ISBN (Electronic)9781733673617
StatePublished - 2020
Event13th International Conference on Educational Data Mining, EDM 2020 - Virtual, Online
Duration: Jul 10 2020Jul 13 2020

Publication series

NameProceedings of the 13th International Conference on Educational Data Mining, EDM 2020

Conference

Conference13th International Conference on Educational Data Mining, EDM 2020
CityVirtual, Online
Period7/10/207/13/20

Keywords

  • Text anonymization
  • discussion forums
  • online learning

ASJC Scopus subject areas

  • Computer Science Applications
  • Information Systems

Fingerprint

Dive into the research topics of '“Hello, [REDACTED]”: Protecting Student Privacy in Analyses of Online Discussion Forums'. Together they form a unique fingerprint.

Cite this