Auditory-modeling inspired methods of feature extraction for robust automatic speech recognition

Zhinian Jing, Mark Hasegawa-Johnson

Research output: Contribution to journalConference articlepeer-review

Abstract

This paper proposes a technique of extracting robust feature vectors for ASR. The technique is inspired by work related to auditory modeling. It involves first filtering the speech signal through a bank of band-pass filters, which are based on a model of the human cochlea. Autocorrelation functions (ACF) are computed on the filters' outputs. Then the individual ACFs are scaled by their corresponding voice indices (VIs), which use information related to the pitch. A summed ACF is then obtained by summing the individual ACFs across the bands. Feature vectors are then computed using standard cepstral analysis, by treating the summed ACF as a regular ACF. Finally, frame indices (FIs) weigh the feature vectors in the time domain. The effectiveness of the proposed techniques, compared to LPCC and MFCC, are demonstrated by comparing the results obtained from simple recognition experiments.

Original languageEnglish (US)
Pages (from-to)IV/4176
JournalICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume4
StatePublished - 2002
Event2002 IEEE International Conference on Acoustic, Speech, and Signal Processing - Orlando, FL, United States
Duration: May 13 2002May 17 2002

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Auditory-modeling inspired methods of feature extraction for robust automatic speech recognition'. Together they form a unique fingerprint.

Cite this