Audio-visual speaker localization using graphical models

  • Akash Kushal
  • , Mandar Rahurkar
  • , Fei Fei Li
  • , Jean Ponce
  • , Thomas Huang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this work we propose an approach to combine audio and video modalities for person tracking using graphical models. We demonstrate a principled and intuitive frame-work for combining these modalities to obtain robustness against occlusion and change in appearance. We further exploit the temporal correlations that exist for a moving object between adjacent frames to account for the cases where having both modalities might still not be enough, e.g., when the person being tracked is occluded and not speaking. Improvement in tracking results is shown at each step and compared with manually annotated ground truth.

Original languageEnglish (US)
Title of host publicationProceedings - 18th International Conference on Pattern Recognition, ICPR 2006
Pages291-294
Number of pages4
DOIs
StatePublished - 2006
Externally publishedYes
Event18th International Conference on Pattern Recognition, ICPR 2006 - Hong Kong, China
Duration: Aug 20 2006Aug 24 2006

Publication series

NameProceedings - International Conference on Pattern Recognition
Volume1
ISSN (Print)1051-4651

Other

Other18th International Conference on Pattern Recognition, ICPR 2006
Country/TerritoryChina
CityHong Kong
Period8/20/068/24/06

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition

Fingerprint

Dive into the research topics of 'Audio-visual speaker localization using graphical models'. Together they form a unique fingerprint.

Cite this