Language coverage for mismatched crowdsourcing

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Developing automatic speech recognition technologies requires transcribed speech so as to learn the mapping from sound to text. It is traditionally assumed that transcribers need to be native speakers of the language being transcribed. Mismatched crowdsourcing is the transcription of speech by crowd workers who do not speak the language. Given there are phonological similarities among different human languages, mismatched crowdsourcing does provide noisy data that can be aggregated to yield reliable labels. Here we discuss phonological properties of different languages in a coding-theoretic framework, and how nonnative phoneme misperception can be modeled as a noisy communication channel. We show the results of experiments demonstrating the efficacy of this information theory inspired modeling approach, having native English speakers and native Mandarin speakers transcribe Cantonese speech. Finally we discuss how crowd workers whose native language background give them the highest probability of faithful transcription can be found by solving a weighted set cover problem.

Original languageEnglish (US)
Title of host publication2016 Information Theory and Applications Workshop, ITA 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781509025299
DOIs
StatePublished - Mar 27 2017
Event2016 Information Theory and Applications Workshop, ITA 2016 - La Jolla, United States
Duration: Jan 31 2016Feb 5 2016

Publication series

Name2016 Information Theory and Applications Workshop, ITA 2016

Other

Other2016 Information Theory and Applications Workshop, ITA 2016
CountryUnited States
CityLa Jolla
Period1/31/162/5/16

Fingerprint

Transcription
Information theory
Speech recognition
Labels
Acoustic waves
Experiments

Keywords

  • channel selection
  • distance distribution
  • mismatched crowdsourcing
  • phonology
  • set cover
  • speech transcription

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Computer Science Applications
  • Artificial Intelligence
  • Information Systems
  • Signal Processing

Cite this

Varshney, L. R., Jyothi, P., & Hasegawa-Johnson, M. A. (2017). Language coverage for mismatched crowdsourcing. In 2016 Information Theory and Applications Workshop, ITA 2016 [7888198] (2016 Information Theory and Applications Workshop, ITA 2016). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ITA.2016.7888198

Language coverage for mismatched crowdsourcing. / Varshney, Lav R; Jyothi, Preethi; Hasegawa-Johnson, Mark Allan.

2016 Information Theory and Applications Workshop, ITA 2016. Institute of Electrical and Electronics Engineers Inc., 2017. 7888198 (2016 Information Theory and Applications Workshop, ITA 2016).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Varshney, LR, Jyothi, P & Hasegawa-Johnson, MA 2017, Language coverage for mismatched crowdsourcing. in 2016 Information Theory and Applications Workshop, ITA 2016., 7888198, 2016 Information Theory and Applications Workshop, ITA 2016, Institute of Electrical and Electronics Engineers Inc., 2016 Information Theory and Applications Workshop, ITA 2016, La Jolla, United States, 1/31/16. https://doi.org/10.1109/ITA.2016.7888198
Varshney LR, Jyothi P, Hasegawa-Johnson MA. Language coverage for mismatched crowdsourcing. In 2016 Information Theory and Applications Workshop, ITA 2016. Institute of Electrical and Electronics Engineers Inc. 2017. 7888198. (2016 Information Theory and Applications Workshop, ITA 2016). https://doi.org/10.1109/ITA.2016.7888198
Varshney, Lav R ; Jyothi, Preethi ; Hasegawa-Johnson, Mark Allan. / Language coverage for mismatched crowdsourcing. 2016 Information Theory and Applications Workshop, ITA 2016. Institute of Electrical and Electronics Engineers Inc., 2017. (2016 Information Theory and Applications Workshop, ITA 2016).
@inproceedings{9fbece148ad643f98bbeec21734a3fef,
title = "Language coverage for mismatched crowdsourcing",
abstract = "Developing automatic speech recognition technologies requires transcribed speech so as to learn the mapping from sound to text. It is traditionally assumed that transcribers need to be native speakers of the language being transcribed. Mismatched crowdsourcing is the transcription of speech by crowd workers who do not speak the language. Given there are phonological similarities among different human languages, mismatched crowdsourcing does provide noisy data that can be aggregated to yield reliable labels. Here we discuss phonological properties of different languages in a coding-theoretic framework, and how nonnative phoneme misperception can be modeled as a noisy communication channel. We show the results of experiments demonstrating the efficacy of this information theory inspired modeling approach, having native English speakers and native Mandarin speakers transcribe Cantonese speech. Finally we discuss how crowd workers whose native language background give them the highest probability of faithful transcription can be found by solving a weighted set cover problem.",
keywords = "channel selection, distance distribution, mismatched crowdsourcing, phonology, set cover, speech transcription",
author = "Varshney, {Lav R} and Preethi Jyothi and Hasegawa-Johnson, {Mark Allan}",
year = "2017",
month = "3",
day = "27",
doi = "10.1109/ITA.2016.7888198",
language = "English (US)",
series = "2016 Information Theory and Applications Workshop, ITA 2016",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
booktitle = "2016 Information Theory and Applications Workshop, ITA 2016",
address = "United States",

}

TY - GEN

T1 - Language coverage for mismatched crowdsourcing

AU - Varshney, Lav R

AU - Jyothi, Preethi

AU - Hasegawa-Johnson, Mark Allan

PY - 2017/3/27

Y1 - 2017/3/27

N2 - Developing automatic speech recognition technologies requires transcribed speech so as to learn the mapping from sound to text. It is traditionally assumed that transcribers need to be native speakers of the language being transcribed. Mismatched crowdsourcing is the transcription of speech by crowd workers who do not speak the language. Given there are phonological similarities among different human languages, mismatched crowdsourcing does provide noisy data that can be aggregated to yield reliable labels. Here we discuss phonological properties of different languages in a coding-theoretic framework, and how nonnative phoneme misperception can be modeled as a noisy communication channel. We show the results of experiments demonstrating the efficacy of this information theory inspired modeling approach, having native English speakers and native Mandarin speakers transcribe Cantonese speech. Finally we discuss how crowd workers whose native language background give them the highest probability of faithful transcription can be found by solving a weighted set cover problem.

AB - Developing automatic speech recognition technologies requires transcribed speech so as to learn the mapping from sound to text. It is traditionally assumed that transcribers need to be native speakers of the language being transcribed. Mismatched crowdsourcing is the transcription of speech by crowd workers who do not speak the language. Given there are phonological similarities among different human languages, mismatched crowdsourcing does provide noisy data that can be aggregated to yield reliable labels. Here we discuss phonological properties of different languages in a coding-theoretic framework, and how nonnative phoneme misperception can be modeled as a noisy communication channel. We show the results of experiments demonstrating the efficacy of this information theory inspired modeling approach, having native English speakers and native Mandarin speakers transcribe Cantonese speech. Finally we discuss how crowd workers whose native language background give them the highest probability of faithful transcription can be found by solving a weighted set cover problem.

KW - channel selection

KW - distance distribution

KW - mismatched crowdsourcing

KW - phonology

KW - set cover

KW - speech transcription

UR - http://www.scopus.com/inward/record.url?scp=85013016573&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85013016573&partnerID=8YFLogxK

U2 - 10.1109/ITA.2016.7888198

DO - 10.1109/ITA.2016.7888198

M3 - Conference contribution

AN - SCOPUS:85013016573

T3 - 2016 Information Theory and Applications Workshop, ITA 2016

BT - 2016 Information Theory and Applications Workshop, ITA 2016

PB - Institute of Electrical and Electronics Engineers Inc.

ER -