Sequential organization of speech in computational auditory scene analysis

Yang Shao, De Liang Wang

Research output: Contribution to journalArticlepeer-review

Abstract

A human listener has the ability to follow a speaker's voice over time in the presence of other talkers and non-speech interference. This paper proposes a general system for sequential organization of speech based on speaker models. By training a general background model, the proposed system is shown to function well with both interfering talkers and non-speech intrusions. To deal with situations where prior information about specific speakers is not available, a speaker quantization method is employed to extract representative models from a large speaker space and obtained generic models are used to perform sequential grouping. Our systematic evaluations show that grouping performance using generic models is only moderately lower than the performance level achieved with known speaker models.

Original languageEnglish (US)
Pages (from-to)657-667
Number of pages11
JournalSpeech Communication
Volume51
Issue number8
DOIs
StatePublished - Aug 2009
Externally publishedYes

Keywords

  • Binary time-frequency mask
  • Computational auditory scene analysis
  • Sequential organization
  • Speaker quantization

ASJC Scopus subject areas

  • Software
  • Modeling and Simulation
  • Communication
  • Language and Linguistics
  • Linguistics and Language
  • Computer Vision and Pattern Recognition
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Sequential organization of speech in computational auditory scene analysis'. Together they form a unique fingerprint.

Cite this