Abstract
The Spontaneous Chinese Learner Speech Corpus consists of 185 hours of audio and video recordings, which was obtained from Chinese speech training classes on a weekly basis from 2004 to 2009 at University of Illinois at Urbana-Champaign. The speakers in this corpus includes 11 Chinese language teacher, 11 Korean-speaking learners, 23 English-speaking learners and 86 Chinese heritage learners. Two paradigms, namely. "Variety Show" and "Debate" were designed to fit in a 50-minute class. Speaker turns were marked with the video editing software ELAN to provide speaker codes and the precise time boundaries demarcating the hour-long recordings into speech turns. Based on the turn-markings, each snippet was displayed on a webpage to obtain a turn-synchronized transcription. The corpus data were used for perceptual ratings and acoustic analysis of fluency and foreign accent, language assessment, speech recognition etc. The database is a prolific resource with speech samples for various research topics.
Original language | English (US) |
---|---|
Title of host publication | Learner Corpus Studies in Asia and the World |
Place of Publication | Japan |
Publisher | Kobe University |
Pages | 115-124 |
Volume | 2 |
DOIs | |
State | Published - Jun 2014 |
Keywords
- spontaneous speech
- oral fluency
- foreign accent
- second language acquisition
- Chinese language learning