A Design of the Spontaneous Chinese Learner Speech Corpus

Chen-huei Wu, Chilin Shih

Research output: Chapter in Book/Report/Conference proceedingConference contribution


The Spontaneous Chinese Learner Speech Corpus consists of 185 hours of audio and video recordings, which was obtained from Chinese speech training classes on a weekly basis from 2004 to 2009 at University of Illinois at Urbana-Champaign. The speakers in this corpus includes 11 Chinese language teacher, 11 Korean-speaking learners, 23 English-speaking learners and 86 Chinese heritage learners. Two paradigms, namely. "Variety Show" and "Debate" were designed to fit in a 50-minute class. Speaker turns were marked with the video editing software ELAN to provide speaker codes and the precise time boundaries demarcating the hour-long recordings into speech turns. Based on the turn-markings, each snippet was displayed on a webpage to obtain a turn-synchronized transcription. The corpus data were used for perceptual ratings and acoustic analysis of fluency and foreign accent, language assessment, speech recognition etc. The database is a prolific resource with speech samples for various research topics.
Original languageEnglish (US)
Title of host publicationLearner Corpus Studies in Asia and the World
Place of PublicationJapan
PublisherKobe University
StatePublished - Jun 2014


  • spontaneous speech
  • oral fluency
  • foreign accent
  • second language acquisition
  • Chinese language learning


Dive into the research topics of 'A Design of the Spontaneous Chinese Learner Speech Corpus'. Together they form a unique fingerprint.

Cite this