Bayesian separation of audio-visual speech sources

Shyamsundar Rajaram, Ara V. Nefian, Thomas S. Huang

Research output: Contribution to journalConference article

Abstract

In this paper we investigate the use of audio and visual rather than only audio features for the task of speech separation in acoustically noisy environments. The success of existing independent component analysis (ICA) systems for the separation of a large variety of signals, including speech, is often limited by the ability of this technique to handle noise. In this paper, we introduce a Bayesian model for the mixing process that describes both the bimodality and the time dependency of speech sources. Our experimental results show that the online demixing process presented here outperforms both the ICA and the audio-only Bayesian model at all levels of noise.

Original languageEnglish (US)
Pages (from-to)V-657-V-660
JournalICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume5
StatePublished - Sep 27 2004
EventProceedings - IEEE International Conference on Acoustics, Speech, and Signal Processing - Montreal, Que, Canada
Duration: May 17 2004May 21 2004

Fingerprint

Independent component analysis

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Cite this

Bayesian separation of audio-visual speech sources. / Rajaram, Shyamsundar; Nefian, Ara V.; Huang, Thomas S.

In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, Vol. 5, 27.09.2004, p. V-657-V-660.

Research output: Contribution to journalConference article

@article{36e2610049004a76898e0de0b26c7798,
title = "Bayesian separation of audio-visual speech sources",
abstract = "In this paper we investigate the use of audio and visual rather than only audio features for the task of speech separation in acoustically noisy environments. The success of existing independent component analysis (ICA) systems for the separation of a large variety of signals, including speech, is often limited by the ability of this technique to handle noise. In this paper, we introduce a Bayesian model for the mixing process that describes both the bimodality and the time dependency of speech sources. Our experimental results show that the online demixing process presented here outperforms both the ICA and the audio-only Bayesian model at all levels of noise.",
author = "Shyamsundar Rajaram and Nefian, {Ara V.} and Huang, {Thomas S.}",
year = "2004",
month = "9",
day = "27",
language = "English (US)",
volume = "5",
pages = "V--657--V--660",
journal = "Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing",
issn = "0736-7791",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Bayesian separation of audio-visual speech sources

AU - Rajaram, Shyamsundar

AU - Nefian, Ara V.

AU - Huang, Thomas S.

PY - 2004/9/27

Y1 - 2004/9/27

N2 - In this paper we investigate the use of audio and visual rather than only audio features for the task of speech separation in acoustically noisy environments. The success of existing independent component analysis (ICA) systems for the separation of a large variety of signals, including speech, is often limited by the ability of this technique to handle noise. In this paper, we introduce a Bayesian model for the mixing process that describes both the bimodality and the time dependency of speech sources. Our experimental results show that the online demixing process presented here outperforms both the ICA and the audio-only Bayesian model at all levels of noise.

AB - In this paper we investigate the use of audio and visual rather than only audio features for the task of speech separation in acoustically noisy environments. The success of existing independent component analysis (ICA) systems for the separation of a large variety of signals, including speech, is often limited by the ability of this technique to handle noise. In this paper, we introduce a Bayesian model for the mixing process that describes both the bimodality and the time dependency of speech sources. Our experimental results show that the online demixing process presented here outperforms both the ICA and the audio-only Bayesian model at all levels of noise.

UR - http://www.scopus.com/inward/record.url?scp=4544247264&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=4544247264&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:4544247264

VL - 5

SP - V-657-V-660

JO - Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing

JF - Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing

SN - 0736-7791

ER -