Construction of a rated speech corpus of L2 learners’ spontaneous speech

Su Youn Yoon, Lisa Pierce, Amanda Huensch, Eric Juul, Samantha Perkins, Richard Sproat, Mark Hasegawa-Johnson

Research output: Contribution to journalArticlepeer-review


This work reports on the construction of a rated database of spontaneous speech produced by second language (L2) learners of English. Spontaneous speech was collected from 28 L2 speakers representing six language backgrounds and five different proficiency levels. Speech was elicited using formats similar to that of the TOEFL iBT and the Speaking Proficiency English Assessment Kit (SPEAK) test. A total of 182 minutes of spontaneous speech were collected, segmented, and assessed by two phonetically trained, experienced ESL instructors. The raters assigned a general fluency score and phone accuracy score with additional detailed comments on pronunciation errors. This database was designed with several applications in mind: the development of computer-aided pronunciation and fluency training, automatic assessment of fluency and pronunciation, and as a tool for researchers working in automatic speech recognition and for linguists more generally. This database will be released to the public in the near future.

Original languageEnglish (US)
Pages (from-to)662-673
Number of pages12
JournalCALICO Journal
Issue number3
StatePublished - 2009


  • Automated Scoring
  • L2
  • Rated Speech Corpus

ASJC Scopus subject areas

  • Education
  • Language and Linguistics
  • Linguistics and Language
  • Computer Science Applications


Dive into the research topics of 'Construction of a rated speech corpus of L2 learners’ spontaneous speech'. Together they form a unique fingerprint.

Cite this