The SRI speech-based collaborative learning corpus

Colleen Richey, Cynthia D'Angelo, Nonye Alozie, Harry Bratt, Elizabeth Shriberg

Research output: Contribution to journalConference articlepeer-review


We introduce the SRI speech-based collaborative learning corpus, a novel collection designed for the investigation and measurement of how students collaborate together in small groups. This is a multi-speaker corpus containing high-quality audio recordings of middle school students working in groups of three to solve mathematical problems. Each student was recorded via a head-mounted noise-cancelling microphone. Each group was also recorded via a stereo microphone placed nearby. A total of 80 sessions were collected with the participation of 134 students. The average duration of a session was 20 minutes. All students spoke English; for some students, English was a second language. Sessions have been annotated with time stamps to indicate which mathematical problem the students were solving and which student was speaking. Sessions have also been hand annotated with common indicators of collaboration for each speaker (e.g., inviting others to contribute, planning) and the overall collaboration quality for each problem. The corpus will be useful to education researchers interested in collaborative learning and to speech researchers interested in children's speech, speech analytics, and speech diarization. The corpus, both audio and annotation, will be made available to researchers.

Original languageEnglish (US)
Pages (from-to)1550-1554
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
StatePublished - 2016
Externally publishedYes
Event17th Annual Conference of the International Speech Communication Association, INTERSPEECH 2016 - San Francisco, United States
Duration: Sep 8 2016Sep 16 2016


  • Automatic speech recognition
  • Children's speech
  • Collaborative learning
  • STEM education
  • Speech corpus

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modeling and Simulation

Fingerprint Dive into the research topics of 'The SRI speech-based collaborative learning corpus'. Together they form a unique fingerprint.

Cite this