Abstract

Immediately following the Second World War, between 1947 and 1955, several classic papers quantified the fundamentals of human speech information processing and recognition. In 1947 French and Steinberg published their classic study on the articulation index. In 1948 Claude Shannon published his famous work on the theory of information. In 1950 Fletcher and Galt published their theory of the articulation index, a theory that Fletcher had worked on for 30 years, which integrated his classic works on loudness and speech perception with models of speech intelligibility. In 1951 George Miller then wrote the first book Language and Communication, analyzing human speech communication with Claude Shannon's just published theory of information. Finally in 1955 George Miller published the first extensive analysis of phone decoding, in the form of confusion matrices, as a function of the speech-to-noise ratio. This work extended the Bell Labs' speech articulation studies with ideas from Shannon's Information theory. Both Miller and Fletcher showed that speech, as a code, is incredibly robust to mangling distortions of filtering and noise. Regrettably much of this early work was forgotten. While the key science of information theory blossomed, other than the work of George Miller, it was rarely applied to aural speech research. The robustness of speech, which is the most amazing thing about the speech code, has rarely been studied. It is my belief (i.e., assumption) that we can analyze speech intelligibility with the scientific method. The quantitative analysis of speech intelligibility requires both science and art. The scientific component requires an error analysis of spoken communication, which depends critically on the use of statistics, information theory, and psychophysical methods. The artistic component depends on knowing how to restrict the problem in such a way that progress may be made. It is critical to tease out the relevant from the irrelevant and dig for the key issues. This will focus us on the decoding of nonsense phonemes with no visual component, which have been mangled by filtering and noise. This monograph is a summary and theory of human speech recognition. It builds on and integrates the work of Fletcher, Miller, and Shannon. The long-term goal is to develop a quantitative theory for predicting the recognition of speech sounds. In Chapter 2 the theory is developed for maximum entropy (MaxEnt) speech sounds, also called nonsense speech. In Chapter 3, context is factored in. The book is largely reflective, and quantitative, with a secondary goal of providing an historical context, along with the many deep insights found in these early works.

Original languageEnglish (US)
Title of host publicationSynthesis Lectures on Speech and Audio Processing
Pages1-124
Number of pages124
DOIs
StatePublished - Oct 1 2005

Publication series

NameSynthesis Lectures on Speech and Audio Processing
Volume1
ISSN (Print)1932-121X
ISSN (Electronic)1932-1678

Fingerprint

intelligibility
Speech intelligibility
Information theory
information theory
communication
decoding
Decoding
Acoustic waves
Speech communication
Communication
phonemes
loudness
acoustics
Speech recognition
confusion
arts
error analysis
Error analysis
speech recognition
bells

Keywords

  • Articulation index
  • Confusion matrix
  • Context models
  • Events
  • Features
  • Phone recognition
  • Robust speech recognition
  • Speech recognition

ASJC Scopus subject areas

  • Signal Processing
  • Acoustics and Ultrasonics
  • Electrical and Electronic Engineering

Cite this

Allen, J. B. (2005). Articulation and intelligibility. In Synthesis Lectures on Speech and Audio Processing (pp. 1-124). (Synthesis Lectures on Speech and Audio Processing; Vol. 1). https://doi.org/10.2200/S00004ED1V01Y200508SAP001

Articulation and intelligibility. / Allen, Jont B.

Synthesis Lectures on Speech and Audio Processing. 2005. p. 1-124 (Synthesis Lectures on Speech and Audio Processing; Vol. 1).

Research output: Chapter in Book/Report/Conference proceedingChapter

Allen, JB 2005, Articulation and intelligibility. in Synthesis Lectures on Speech and Audio Processing. Synthesis Lectures on Speech and Audio Processing, vol. 1, pp. 1-124. https://doi.org/10.2200/S00004ED1V01Y200508SAP001
Allen JB. Articulation and intelligibility. In Synthesis Lectures on Speech and Audio Processing. 2005. p. 1-124. (Synthesis Lectures on Speech and Audio Processing). https://doi.org/10.2200/S00004ED1V01Y200508SAP001
Allen, Jont B. / Articulation and intelligibility. Synthesis Lectures on Speech and Audio Processing. 2005. pp. 1-124 (Synthesis Lectures on Speech and Audio Processing).
@inbook{66f4592032a946b081487471478b7a6d,
title = "Articulation and intelligibility",
abstract = "Immediately following the Second World War, between 1947 and 1955, several classic papers quantified the fundamentals of human speech information processing and recognition. In 1947 French and Steinberg published their classic study on the articulation index. In 1948 Claude Shannon published his famous work on the theory of information. In 1950 Fletcher and Galt published their theory of the articulation index, a theory that Fletcher had worked on for 30 years, which integrated his classic works on loudness and speech perception with models of speech intelligibility. In 1951 George Miller then wrote the first book Language and Communication, analyzing human speech communication with Claude Shannon's just published theory of information. Finally in 1955 George Miller published the first extensive analysis of phone decoding, in the form of confusion matrices, as a function of the speech-to-noise ratio. This work extended the Bell Labs' speech articulation studies with ideas from Shannon's Information theory. Both Miller and Fletcher showed that speech, as a code, is incredibly robust to mangling distortions of filtering and noise. Regrettably much of this early work was forgotten. While the key science of information theory blossomed, other than the work of George Miller, it was rarely applied to aural speech research. The robustness of speech, which is the most amazing thing about the speech code, has rarely been studied. It is my belief (i.e., assumption) that we can analyze speech intelligibility with the scientific method. The quantitative analysis of speech intelligibility requires both science and art. The scientific component requires an error analysis of spoken communication, which depends critically on the use of statistics, information theory, and psychophysical methods. The artistic component depends on knowing how to restrict the problem in such a way that progress may be made. It is critical to tease out the relevant from the irrelevant and dig for the key issues. This will focus us on the decoding of nonsense phonemes with no visual component, which have been mangled by filtering and noise. This monograph is a summary and theory of human speech recognition. It builds on and integrates the work of Fletcher, Miller, and Shannon. The long-term goal is to develop a quantitative theory for predicting the recognition of speech sounds. In Chapter 2 the theory is developed for maximum entropy (MaxEnt) speech sounds, also called nonsense speech. In Chapter 3, context is factored in. The book is largely reflective, and quantitative, with a secondary goal of providing an historical context, along with the many deep insights found in these early works.",
keywords = "Articulation index, Confusion matrix, Context models, Events, Features, Phone recognition, Robust speech recognition, Speech recognition",
author = "Allen, {Jont B.}",
year = "2005",
month = "10",
day = "1",
doi = "10.2200/S00004ED1V01Y200508SAP001",
language = "English (US)",
isbn = "1598290657",
series = "Synthesis Lectures on Speech and Audio Processing",
pages = "1--124",
booktitle = "Synthesis Lectures on Speech and Audio Processing",

}

TY - CHAP

T1 - Articulation and intelligibility

AU - Allen, Jont B.

PY - 2005/10/1

Y1 - 2005/10/1

N2 - Immediately following the Second World War, between 1947 and 1955, several classic papers quantified the fundamentals of human speech information processing and recognition. In 1947 French and Steinberg published their classic study on the articulation index. In 1948 Claude Shannon published his famous work on the theory of information. In 1950 Fletcher and Galt published their theory of the articulation index, a theory that Fletcher had worked on for 30 years, which integrated his classic works on loudness and speech perception with models of speech intelligibility. In 1951 George Miller then wrote the first book Language and Communication, analyzing human speech communication with Claude Shannon's just published theory of information. Finally in 1955 George Miller published the first extensive analysis of phone decoding, in the form of confusion matrices, as a function of the speech-to-noise ratio. This work extended the Bell Labs' speech articulation studies with ideas from Shannon's Information theory. Both Miller and Fletcher showed that speech, as a code, is incredibly robust to mangling distortions of filtering and noise. Regrettably much of this early work was forgotten. While the key science of information theory blossomed, other than the work of George Miller, it was rarely applied to aural speech research. The robustness of speech, which is the most amazing thing about the speech code, has rarely been studied. It is my belief (i.e., assumption) that we can analyze speech intelligibility with the scientific method. The quantitative analysis of speech intelligibility requires both science and art. The scientific component requires an error analysis of spoken communication, which depends critically on the use of statistics, information theory, and psychophysical methods. The artistic component depends on knowing how to restrict the problem in such a way that progress may be made. It is critical to tease out the relevant from the irrelevant and dig for the key issues. This will focus us on the decoding of nonsense phonemes with no visual component, which have been mangled by filtering and noise. This monograph is a summary and theory of human speech recognition. It builds on and integrates the work of Fletcher, Miller, and Shannon. The long-term goal is to develop a quantitative theory for predicting the recognition of speech sounds. In Chapter 2 the theory is developed for maximum entropy (MaxEnt) speech sounds, also called nonsense speech. In Chapter 3, context is factored in. The book is largely reflective, and quantitative, with a secondary goal of providing an historical context, along with the many deep insights found in these early works.

AB - Immediately following the Second World War, between 1947 and 1955, several classic papers quantified the fundamentals of human speech information processing and recognition. In 1947 French and Steinberg published their classic study on the articulation index. In 1948 Claude Shannon published his famous work on the theory of information. In 1950 Fletcher and Galt published their theory of the articulation index, a theory that Fletcher had worked on for 30 years, which integrated his classic works on loudness and speech perception with models of speech intelligibility. In 1951 George Miller then wrote the first book Language and Communication, analyzing human speech communication with Claude Shannon's just published theory of information. Finally in 1955 George Miller published the first extensive analysis of phone decoding, in the form of confusion matrices, as a function of the speech-to-noise ratio. This work extended the Bell Labs' speech articulation studies with ideas from Shannon's Information theory. Both Miller and Fletcher showed that speech, as a code, is incredibly robust to mangling distortions of filtering and noise. Regrettably much of this early work was forgotten. While the key science of information theory blossomed, other than the work of George Miller, it was rarely applied to aural speech research. The robustness of speech, which is the most amazing thing about the speech code, has rarely been studied. It is my belief (i.e., assumption) that we can analyze speech intelligibility with the scientific method. The quantitative analysis of speech intelligibility requires both science and art. The scientific component requires an error analysis of spoken communication, which depends critically on the use of statistics, information theory, and psychophysical methods. The artistic component depends on knowing how to restrict the problem in such a way that progress may be made. It is critical to tease out the relevant from the irrelevant and dig for the key issues. This will focus us on the decoding of nonsense phonemes with no visual component, which have been mangled by filtering and noise. This monograph is a summary and theory of human speech recognition. It builds on and integrates the work of Fletcher, Miller, and Shannon. The long-term goal is to develop a quantitative theory for predicting the recognition of speech sounds. In Chapter 2 the theory is developed for maximum entropy (MaxEnt) speech sounds, also called nonsense speech. In Chapter 3, context is factored in. The book is largely reflective, and quantitative, with a secondary goal of providing an historical context, along with the many deep insights found in these early works.

KW - Articulation index

KW - Confusion matrix

KW - Context models

KW - Events

KW - Features

KW - Phone recognition

KW - Robust speech recognition

KW - Speech recognition

UR - http://www.scopus.com/inward/record.url?scp=33751104255&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33751104255&partnerID=8YFLogxK

U2 - 10.2200/S00004ED1V01Y200508SAP001

DO - 10.2200/S00004ED1V01Y200508SAP001

M3 - Chapter

AN - SCOPUS:33751104255

SN - 1598290657

SN - 9781598290653

T3 - Synthesis Lectures on Speech and Audio Processing

SP - 1

EP - 124

BT - Synthesis Lectures on Speech and Audio Processing

ER -