Accurate speech segmentation by mimicking human auditory processing

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper addresses the problem of locating phone boundaries without prior knowledge of the text of an utterance. A biomimetic model of human auditory processing is used to calculate the neural features of frequency synchrony and average signal level. Frequency synchrony and average signal level are used as input to a two-layered support vector machine (SVM)-based system to detect phone boundaries. Phone boundaries are detected with 87.0% precision and 84.8% recall when the automatic segmentation system has no prior knowledge of the phone sequence in the utterance.

Original languageEnglish (US)
Title of host publication2013 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 - Proceedings
Pages8096-8100
Number of pages5
DOIs
StatePublished - Oct 18 2013
Event2013 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 - Vancouver, BC, Canada
Duration: May 26 2013May 31 2013

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149

Other

Other2013 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013
Country/TerritoryCanada
CityVancouver, BC
Period5/26/135/31/13

Keywords

  • Automatic segmentation
  • auditory modeling
  • average signal level
  • frequency synchrony

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Accurate speech segmentation by mimicking human auditory processing'. Together they form a unique fingerprint.

Cite this