Transcribing continuous speech using mismatched crowdsourcing

Research output: Contribution to journalConference article

Abstract

Mismatched crowdsourcing derives speech transcriptions using crowd workers unfamiliar with the language being spoken. This approach has been demonstrated for isolated word transcription tasks, but never yet for continuous speech. In this work, we demonstrate mismatched crowdsourcing of continuous speech with a word error rate of under 45% in a large-vocabulary transcription task of short speech segments. In order to scale mismatched crowdsourcing to continuous speech, we propose a number of new WFST pruning techniques based on explicitly low-entropy models of the acoustic similarities among orthographic symbols as understood within a transcriber community. We also provide an information-theoretic analysis and estimate the amount of information lost in transcription by the mismatched crowd workers to be under 5 bits.

Original languageEnglish (US)
Pages (from-to)2774-2778
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2015-January
StatePublished - Jan 1 2015
Event16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015 - Dresden, Germany
Duration: Sep 6 2015Sep 10 2015

Fingerprint

Transcription
Information analysis
Pruning
Error Rate
Acoustics
Entropy
Speech
Continuous Speech
Estimate
Demonstrate
Workers
Crowds
Model

Keywords

  • Crowdsourcing
  • Information-theoretic Analysis
  • Noisy Channel Models
  • Speech transcriptions

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modeling and Simulation

Cite this

Transcribing continuous speech using mismatched crowdsourcing. / Jyothi, Preethi; Hasegawa-Johnson, Mark Allan.

In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, Vol. 2015-January, 01.01.2015, p. 2774-2778.

Research output: Contribution to journalConference article

@article{2151ca09cfdc443aa7b7e51a732abce7,
title = "Transcribing continuous speech using mismatched crowdsourcing",
abstract = "Mismatched crowdsourcing derives speech transcriptions using crowd workers unfamiliar with the language being spoken. This approach has been demonstrated for isolated word transcription tasks, but never yet for continuous speech. In this work, we demonstrate mismatched crowdsourcing of continuous speech with a word error rate of under 45{\%} in a large-vocabulary transcription task of short speech segments. In order to scale mismatched crowdsourcing to continuous speech, we propose a number of new WFST pruning techniques based on explicitly low-entropy models of the acoustic similarities among orthographic symbols as understood within a transcriber community. We also provide an information-theoretic analysis and estimate the amount of information lost in transcription by the mismatched crowd workers to be under 5 bits.",
keywords = "Crowdsourcing, Information-theoretic Analysis, Noisy Channel Models, Speech transcriptions",
author = "Preethi Jyothi and Hasegawa-Johnson, {Mark Allan}",
year = "2015",
month = "1",
day = "1",
language = "English (US)",
volume = "2015-January",
pages = "2774--2778",
journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
issn = "2308-457X",

}

TY - JOUR

T1 - Transcribing continuous speech using mismatched crowdsourcing

AU - Jyothi, Preethi

AU - Hasegawa-Johnson, Mark Allan

PY - 2015/1/1

Y1 - 2015/1/1

N2 - Mismatched crowdsourcing derives speech transcriptions using crowd workers unfamiliar with the language being spoken. This approach has been demonstrated for isolated word transcription tasks, but never yet for continuous speech. In this work, we demonstrate mismatched crowdsourcing of continuous speech with a word error rate of under 45% in a large-vocabulary transcription task of short speech segments. In order to scale mismatched crowdsourcing to continuous speech, we propose a number of new WFST pruning techniques based on explicitly low-entropy models of the acoustic similarities among orthographic symbols as understood within a transcriber community. We also provide an information-theoretic analysis and estimate the amount of information lost in transcription by the mismatched crowd workers to be under 5 bits.

AB - Mismatched crowdsourcing derives speech transcriptions using crowd workers unfamiliar with the language being spoken. This approach has been demonstrated for isolated word transcription tasks, but never yet for continuous speech. In this work, we demonstrate mismatched crowdsourcing of continuous speech with a word error rate of under 45% in a large-vocabulary transcription task of short speech segments. In order to scale mismatched crowdsourcing to continuous speech, we propose a number of new WFST pruning techniques based on explicitly low-entropy models of the acoustic similarities among orthographic symbols as understood within a transcriber community. We also provide an information-theoretic analysis and estimate the amount of information lost in transcription by the mismatched crowd workers to be under 5 bits.

KW - Crowdsourcing

KW - Information-theoretic Analysis

KW - Noisy Channel Models

KW - Speech transcriptions

UR - http://www.scopus.com/inward/record.url?scp=84959114771&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84959114771&partnerID=8YFLogxK

M3 - Conference article

VL - 2015-January

SP - 2774

EP - 2778

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

SN - 2308-457X

ER -