Mining MOOC lecture transcripts to construct concept dependency graphs

Fareedah ALSaad, Assma Boughoula, Chase Geigle, Hari Sundaram, Cheng Xiang Zhai

Research output: Contribution to conferencePaperpeer-review


This paper addresses the question of identifying a concept dependency graph for a MOOC through unsupervised analysis of lecture transcripts. The problem is important: extracting a concept graph is the first step in helping students with varying preparation to understand course material. The problem is challenging: instructors are unaware of the student preparation diversity and may be unable to identify the right resolution of the concepts, necessitating costly updates; inferring concepts from groups suffers from polysemy; the temporal order of concepts depends on the concepts in question. We propose innovative unsupervised methods to discover a directed concept dependency within and between lectures. Our main technical innovation lies in exploiting the temporal ordering amongst concepts to discover the graph. We propose two measures—the Bridge Ensemble Measure and the Global Direction Measure—to infer the existence and the direction of the dependency relations between concepts. The bridge ensemble measure identifies concept overlap between lectures, determines concept co-occurrence within short windows, and the lecture where concepts occur first. The global direction measure incorporates time directly by analyzing the concept time ordering both globally and within lectures. Experiments over real-world MOOC data show that our method outperforms the baseline in both AUC and precision/recall curves.

Original languageEnglish (US)
StatePublished - 2018
Event11th International Conference on Educational Data Mining, EDM 2018 - Buffalo, United States
Duration: Jul 15 2018Jul 18 2018


Conference11th International Conference on Educational Data Mining, EDM 2018
Country/TerritoryUnited States


  • Bridge ensemble measure
  • Concept dependency graph
  • Edge direction
  • Edge existence
  • Global direction measure
  • Temporal order

ASJC Scopus subject areas

  • Computer Science Applications
  • Information Systems


Dive into the research topics of 'Mining MOOC lecture transcripts to construct concept dependency graphs'. Together they form a unique fingerprint.

Cite this