A computational auditory scene analysis system for robust speech recognition

Soundararajan Srinivasan, Yang Shao, Zhaozhang Jin, De Liang Wang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We present a computational auditory scene analysis system for separating and recognizing target speech in the presence of competing speech or noise. We estimate, in two stages, the ideal binary time-frequency (T-F) mask which retains the mixture in a local T-F unit if and only if the target is stronger than the interference within the unit. In the first stage, we use harmonicky to segre-gate the voiced portions of individual sources in each time frame based on multipitch tracking. Additionally, unvoiced portions are segmented based on an onset/offset analysis. In the second stage, speaker characteristics are used to group the T-F units across time frames. The resulting T-F masks are used in conjunction with missing-data methods for recognition. Systematic evaluations on a speech separation challenge task show significant improvement over the baseline performance.

Original languageEnglish (US)
Title of host publicationINTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP
PublisherInternational Speech Communication Association
Pages73-76
Number of pages4
ISBN (Print)9781604234497
StatePublished - Jan 1 2006
Externally publishedYes
EventINTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP - Pittsburgh, PA, United States
Duration: Sep 17 2006Sep 21 2006

Publication series

NameProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume1
ISSN (Electronic)1990-9772

Other

OtherINTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP
Country/TerritoryUnited States
CityPittsburgh, PA
Period9/17/069/21/06

Keywords

  • Binary time-frequency mask
  • Computational auditory scene analysis
  • Robust speech recognition
  • Speech segregation

ASJC Scopus subject areas

  • Computer Science(all)

Fingerprint

Dive into the research topics of 'A computational auditory scene analysis system for robust speech recognition'. Together they form a unique fingerprint.

Cite this