Abstract

Developing automatic speech recognition technologies requires transcribed speech so as to learn the mapping from sound to text. It is traditionally assumed that transcribers need to be native speakers of the language being transcribed. Mismatched crowdsourcing is the transcription of speech by crowd workers who do not speak the language. Given there are phonological similarities among different human languages, mismatched crowdsourcing does provide noisy data that can be aggregated to yield reliable labels. Here we discuss phonological properties of different languages in a coding-theoretic framework, and how nonnative phoneme misperception can be modeled as a noisy communication channel. We show the results of experiments demonstrating the efficacy of this information theory inspired modeling approach, having native English speakers and native Mandarin speakers transcribe Cantonese speech. Finally we discuss how crowd workers whose native language background give them the highest probability of faithful transcription can be found by solving a weighted set cover problem.

Original languageEnglish (US)
Title of host publication2016 Information Theory and Applications Workshop, ITA 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781509025299
DOIs
StatePublished - Mar 27 2017
Event2016 Information Theory and Applications Workshop, ITA 2016 - La Jolla, United States
Duration: Jan 31 2016Feb 5 2016

Publication series

Name2016 Information Theory and Applications Workshop, ITA 2016

Other

Other2016 Information Theory and Applications Workshop, ITA 2016
Country/TerritoryUnited States
CityLa Jolla
Period1/31/162/5/16

Keywords

  • channel selection
  • distance distribution
  • mismatched crowdsourcing
  • phonology
  • set cover
  • speech transcription

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Computer Science Applications
  • Artificial Intelligence
  • Information Systems
  • Signal Processing

Fingerprint

Dive into the research topics of 'Language coverage for mismatched crowdsourcing'. Together they form a unique fingerprint.

Cite this