Accurate speech segmentation by mimicking human auditory processing

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper addresses the problem of locating phone boundaries without prior knowledge of the text of an utterance. A biomimetic model of human auditory processing is used to calculate the neural features of frequency synchrony and average signal level. Frequency synchrony and average signal level are used as input to a two-layered support vector machine (SVM)-based system to detect phone boundaries. Phone boundaries are detected with 87.0% precision and 84.8% recall when the automatic segmentation system has no prior knowledge of the phone sequence in the utterance.

Original languageEnglish (US)
Title of host publication2013 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 - Proceedings
Pages8096-8100
Number of pages5
DOIs
StatePublished - Oct 18 2013
Event2013 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 - Vancouver, BC, Canada
Duration: May 26 2013May 31 2013

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149

Other

Other2013 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013
CountryCanada
CityVancouver, BC
Period5/26/135/31/13

Fingerprint

Biomimetics
Support vector machines
Processing

Keywords

  • Automatic segmentation
  • auditory modeling
  • average signal level
  • frequency synchrony

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Cite this

King, S., & Hasegawa-Johnson, M. (2013). Accurate speech segmentation by mimicking human auditory processing. In 2013 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 - Proceedings (pp. 8096-8100). [6639242] (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings). https://doi.org/10.1109/ICASSP.2013.6639242

Accurate speech segmentation by mimicking human auditory processing. / King, Sarah; Hasegawa-Johnson, Mark.

2013 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 - Proceedings. 2013. p. 8096-8100 6639242 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

King, S & Hasegawa-Johnson, M 2013, Accurate speech segmentation by mimicking human auditory processing. in 2013 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 - Proceedings., 6639242, ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, pp. 8096-8100, 2013 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013, Vancouver, BC, Canada, 5/26/13. https://doi.org/10.1109/ICASSP.2013.6639242
King S, Hasegawa-Johnson M. Accurate speech segmentation by mimicking human auditory processing. In 2013 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 - Proceedings. 2013. p. 8096-8100. 6639242. (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings). https://doi.org/10.1109/ICASSP.2013.6639242
King, Sarah ; Hasegawa-Johnson, Mark. / Accurate speech segmentation by mimicking human auditory processing. 2013 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 - Proceedings. 2013. pp. 8096-8100 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings).
@inproceedings{5fa395f4b14a4d3bae32e1fb0c296220,
title = "Accurate speech segmentation by mimicking human auditory processing",
abstract = "This paper addresses the problem of locating phone boundaries without prior knowledge of the text of an utterance. A biomimetic model of human auditory processing is used to calculate the neural features of frequency synchrony and average signal level. Frequency synchrony and average signal level are used as input to a two-layered support vector machine (SVM)-based system to detect phone boundaries. Phone boundaries are detected with 87.0{\%} precision and 84.8{\%} recall when the automatic segmentation system has no prior knowledge of the phone sequence in the utterance.",
keywords = "Automatic segmentation, auditory modeling, average signal level, frequency synchrony",
author = "Sarah King and Mark Hasegawa-Johnson",
year = "2013",
month = "10",
day = "18",
doi = "10.1109/ICASSP.2013.6639242",
language = "English (US)",
isbn = "9781479903566",
series = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",
pages = "8096--8100",
booktitle = "2013 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 - Proceedings",

}

TY - GEN

T1 - Accurate speech segmentation by mimicking human auditory processing

AU - King, Sarah

AU - Hasegawa-Johnson, Mark

PY - 2013/10/18

Y1 - 2013/10/18

N2 - This paper addresses the problem of locating phone boundaries without prior knowledge of the text of an utterance. A biomimetic model of human auditory processing is used to calculate the neural features of frequency synchrony and average signal level. Frequency synchrony and average signal level are used as input to a two-layered support vector machine (SVM)-based system to detect phone boundaries. Phone boundaries are detected with 87.0% precision and 84.8% recall when the automatic segmentation system has no prior knowledge of the phone sequence in the utterance.

AB - This paper addresses the problem of locating phone boundaries without prior knowledge of the text of an utterance. A biomimetic model of human auditory processing is used to calculate the neural features of frequency synchrony and average signal level. Frequency synchrony and average signal level are used as input to a two-layered support vector machine (SVM)-based system to detect phone boundaries. Phone boundaries are detected with 87.0% precision and 84.8% recall when the automatic segmentation system has no prior knowledge of the phone sequence in the utterance.

KW - Automatic segmentation

KW - auditory modeling

KW - average signal level

KW - frequency synchrony

UR - http://www.scopus.com/inward/record.url?scp=84890508656&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84890508656&partnerID=8YFLogxK

U2 - 10.1109/ICASSP.2013.6639242

DO - 10.1109/ICASSP.2013.6639242

M3 - Conference contribution

AN - SCOPUS:84890508656

SN - 9781479903566

T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

SP - 8096

EP - 8100

BT - 2013 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 - Proceedings

ER -